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METHODS OF ANALYSIS AND MANIPULATION OF 
DNA UTILIZING MISMATCH REPAIR SYSTEMS 

DESCRIPTION 

This work was supported by the U.S. government, 
namely Grant No. GM23719. The U.S. government may have 
rights in this invention. 

FIELD OF THE INVENTION 
The present invention relates to methods for 
mapping genetic differences among deoxyribonucleic acid 
("DNA") molecules, especially mutations involving a 
5 difference in a single base between the base sequences of 
two homologous DNA molecules. 

BACKGROUND OF THE INVENTION 
This application is a continuation-in-part of 
Modrich et al . , U.S. Serial No. 08/002 f 529 f filed January 
10 11, 1993, entitled "Methods For Mapping Genetic Mutations" 
which is a continuation of U.S. Serial No. 07/3 50,983, 
filed May 12, 1989, entitled, "Methods For Mapping Genetic 
Mutations", both hereby incorporated by reference herein, 
including drawings . 
15 The following is a discussion of relevant art, 

none of which is admitted to be prior art to the appended 
claims . 

Mapping of genetic differences between 
individuals is of growing importance for both forensic and 

20 medical applications. For example, DNA "fingerprinting" 
methods are being applied for identification of 
perpetrators of crimes where even small amounts of blood 
or sperm are available for analysis. Biological parents 
can also be identified by comparing DNAs of a child and a 

25 suspected parent using such means. Further, a number of 
inherited pathological conditions may be diagnosed before 
onset of symptoms, even .in utero, using methods for 
structural analyses of DNA. Finally, it is notable that 
a major international effort to physically map and, 
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ultimately, to determine the sequence of bases in the DNA 
encoding the entire human genome is now underway and 
gaining momentum in both institutional and commercial 
settings . 

DNA molecules are linear polymers of subunits 
called nucleotides- Each nucleotide comprises a common 
cyclic sugar molecule, which in DNA is linked by phosphate 
groups on opposite sides to the sugars of adjoining 
nucleotides, and one of several cyclic substituents called 
bases. The four bases commonly found in DNAs from natural 
sources are adenine, guanine, cytosine and thymine, 
hereinafter referred to as A, G, C and T, respectively. 
The linear sequence of these bases in the DNA of an 
individual encodes the genetic information that determines 
the heritable characteristics of that individual. 

In double -stranded DNA, such as occurs in the 
chromosomes of all cellular organisms, the two DNA strands 
are entwined in a precise helical configuration with the 
bases projecting inward and so aligned as to allow 
interactions between bases from opposing strands. The two 
strands are held together in precise alignment mainly by 
hydrogen bonds which are permitted between bases by a 
complementarity of structures of specif ic pairs of bases. 
This structural complementarity is determined by the 
chemical natures and locations of substituents on each of 
the bases. Thus, in double- stranded DNA, normally each A 
on one strand pairs with a T from the opposing strand, 
and, likewise, each G with an opposing C. 

When a cell undergoes reproduction, its DNA 
0 molecules are replicated and precise copies are passed on 
to its descendants. The linear base sequence of a DNA 
molecule is maintained in the progeny during replication 
in the first instance by the complement ary base pairings 
which allow each strand of the DNA duplex to serve as a 
5 template to align free nucleotides with its polymerized 
nucleotides. The complementary nucleotides so aligned are 
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biochemically polymerized into a new DNA strand with a 
base sequence that is entirely complement airy to that of 
the template strand. 

Occasionally, an incorrect base pairing does 
5 occur during replication, which, after further replication 
of the new strand, results in a double- stranded DNA 
offspring with a sequence containing a heritable single 
base difference from that of the parent DNA molecule. 
Such heritable changes are called genetic mutations, or 

10 more particularly in the present case, "single base pair" 
or "point" mutations. The consequences of a point 
mutation may range from negligible to lethal, depending on 
the location and effect of the sequence change in relation 
to the genetic information encoded by the DNA. 

15 The bases A and G are of a class of compounds 

called purines, while T and C are pyrimidines. Whereas 
the normal base pairings in DNA (A with T, G with C) 
involve one purine and one pyrimidine, the most common 
single base mutations involve substitution of one purine 

20 or pyrimidine for the other ( e.g. . A for G or C for T or 
vice versa) , a type of mutation referred to as a 
"transition" . Mutations in which a purine is substituted 
for a pyrimidine , or vice versa , are less frequently 
occurring and are called "transversions" . Still less 

25 common are point mutations comprising the addition or loss 
of a small number (1, 2 or 3) of nucleotides arising in 
one strand of a DNA duplex at some stage of the 
replication process. Such mutations are called small 
"insertions" or "deletions", respectively, and are also 

3 0 known as "f rameshif t " mutations in the case of 
insertion/deletion of one of two nucleotides, due to their 
effects on translation of the genetic code into proteins. 
Mutations involving larger sequence rearrangement also do 
occur and can be important in medical genetics, but their 

3 5 occurrences are relatively rare compared to the classes 
summari zed above . 
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Mapping of genetic mutations involves both the 
detection of sequence differences between DNA molecules 
comprising substantially identical (i.e., homologous) base 
sequences, and also the physical localization of those 
5 differences within some subset of the sequences in the 
molecules being compared. In principle, it is possible to 
both detect and localize limited genetic differences, 
including point mutations within genetic sequences of two 
individuals, by directly comparing the sequences of the 
10 bases in their DNA molecules. 

Other methods for detecting differences between 
DNA sequences have been developed- For example, some 
pairs of single -stranded DNA fragments with sequences 
differing in a single base may be distinguished by their 
15 different migration rates in electric fields, as in 
denaturing gradient gel electrophoresis. 

DNA restriction systems found in bacteria for 
example, comprise proteins which generally recognize 
specific sequences in double -stranded DNA composed of 4 to 
20 6 or more base pairs* In the absence of certain 
modifications ( e.g. , a covalently attached methyl group) 
at definite positions within the restriction recognition 
sequence, endonuclease components of the restriction 
system will cleave both strands of a DNA molecule at 
25 specific sites within or near the recognition sequence. 
Such short recognition sequences occur by chance in all 
natural DNA sequences, once in every few hundred or 
thousand base pairs, depending on the recognition sequence 
length. Thus, digestion of a DNA molecule with various 
3 0 restriction endonucleases , followed by analyses of the 
sizes of the resulting fragments (e.g., by gel 
electrophoresis) , may be used to generate a physical map 
("fingerprint") of the locations in a DNA molecule of 
selected short sequences . 
35 Comparisons of such restriction maps of two 

homologous DNA sequences can reveal differences within 
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those specific sequences that are recognized by those 
restriction enzymes used in the available maps. 
Restriction map comparisons may localize any detectable 
differences within limits defined ultimately by the 
5 resolving power of DNA fragment size determination, 
essentially within about the length of the restriction 
recognition sequence under certain conditions of gel 
electrophoresis . 

In practice, selected heritable differences in 

10 restriction fragment lengths ( i.e. . restriction fragment 
length polymorphisms / "RFLP"s) have been extremely useful, 
for instance, for generating physical maps of the human 
genome on which genetic defects may be located with a 
relatively low precision of hundreds or, sometimes, tens 

15 of thousands of base pairs. Typically, RFLPs are detected 
in human DNA isolated from small tissue or blood samples 
by using radioactively labeled DNA fragments complementary 
to the genes of interest. These "probes" are allowed to 
form DNA duplexes with restriction fragments of the human 

20 DNA after separation by electrophoresis, and the resulting 
radioactive duplex fragments are visualized by exposure to 
photographic ( e.g. . X-ray sensitive) film, thereby 
allowing selective detection of only the relevant gene 
sequences amid the myriad of others in the genomic DNA. 

25 When the search for DNA sequence differences can 

be confined to specific regions of known sequence, the 
recently developed "polymerase chain reaction" ("PCR" ) 
technology can be used. Briefly, this method utilizes 
short DNA fragments complementary to sequences on either 

3 0 side of the location to be analyzed to serve as points of 
initiation for DNA synthesis (i.e., "primers") by purified 
DNA polymerase. The resulting cyclic process of DNA 
synthesis results in massive biochemical amplification of 
the sequences selected for analysis, which then may be 

35 easily detected and, if desired, further analyzed, for 
example, by restriction mapping or direct DNA sequencing 
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methods. In this way, selected regions of a human gene 
comprising a few kbp may be amplified and examined for 
sequence variations . 

Another known method for detecting and localizing 
5 single base differences within homologous DNA molecules 
involves the use of a radiolabeled RNA fragment with base 
sequence complementary to one of the DNAs and a nuclease 
that recognizes and cleaves single -stranded RNA. The 
structure of RNA is highly similar to DNA, except for a 

10 different sugar and the presence of uracil (U) in place of 
T; hence, RNA and DNA strands with complementary sequences 
can form helical duplexes ("DNA:RNA hybrids") similar to 
double-stranded DNA, with base pairing between A' s and U's 
instead of A's and T's. It is known that the enzyme 

15 ribonuclease A ("RNase A") can recognize some single pairs 
of mismatched bases (i.e., "base mispairs") in DNA : RNA 
hybrids and can cleave the RNA strand at the mispair site. 
Analysis of the sizes of the products resulting from RNase 
A digestion allows localization of single base mismatches, 

20 potentially to the precise sequence position, within 
lengths of homologous sequences determined by the limits 
of resolution of the RNA sizing analysis (Myers, R. M. et 
al., 1985, Science, 230, 1242-1246). RNA sizing is 
performed in this method by standard gel electrophoresis 

25 procedures used in DNA sequencing. 

SI nuclease, an endonuclease specific for single- 
stranded nucleic acids , can recognize and cleave limited 
regions of mismatched base pairs in DNA: DNA or DNA: RNA 
duplexes. A mismatch of at least about 4 consecutive base 

3 0 pairs actually is generally required for recognition and 
cleavage of a duplex by SI nuclease. 

Ford et al . , (U.S. Patent 4,794,075) disclose a 
chemical modification procedure to detect and localize 
mispaired guanines and thymidines and to fractionate a 

3 5 pool of hybrid DNA from two samples obtained from related 
individuals. Carbodiimide is used to specifically 
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derivatize unpaired G's and T's, which remain covalently 
associated with the DNA helix. 

The present invention concerns use of proteins 
that function biologically to recognize mismatched base 
5 pairs in double -stranded DNA (and, therefore, are called 
"mispair recognition proteins") and their application in 
defined systems for detecting and mapping point mutations 
in DNAs. Accordingly, it is an object of the present 
invention to provide methods for using such mispair 

10 recognition proteins, alone or in combination with other 
proteins, for detecting and localizing base pair 
mismatches in duplex DNA molecules, particularly those 
DNAs comprising several kbp, and manipulating molecules 
containing such mismatches. Additionally, it is an object 

15 of this invention to develop modified forms of mispair 
recognition proteins to further simplify methods for 
identifying specific bases which differ between DNAs . The 
following is a brief outline of the art regarding mispair 
recognition proteins and systems, none of which is 

20 admitted to be prior art to the present invention. 

Enzymatic systems capable of recognition and 
correction of base pairing errors within the DNA helix 
have been demonstrated in bacteria, fungi and mammalian 
cells, but the mechanisms and functions of mismatch 

25 correction are best understood in Escherichia coli . One 
of the several mismatch repair systems that have been 
identified in E. coli is the methyl -directed pathway for 
repair of DNA biosynthetic errors. The fidelity of DNA 
replication in E. coli is enhanced 100-1000 fold by this 

3 0 post -replication mismatch correction system. This system 
processes base pairing errors within the helix in a 
strand- specif ic manner by exploiting patterns of DNA 
methylation. Since DNA methylation is a post-synthetic 
modification, newly synthesized strands temporarily exist 

3 5 in an unmethylated state, with the transient absence of 
adenine methylation on GATC sequences directing mismatch 
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correction to new DNA strands within the hemimethylated 
duplexes . 

In vivo analyses in EL. coli have shown that 
selected examples of each of the different mismatches are 
5 subject to correction with different efficiencies. G-T, 
A-C, G-G and A- A mismatches are typically subject to 
efficient repair, A-G, C-T, T-T and C-C are weaker 
substrates, but well repaired exceptions exist within this 
class* The sequence environment of a mismatched base pair 

10 may be an important factor in determining the efficiency 
of repair in vivo . The mismatch correction system is also 
capable in vivo of correcting differences between duplexed 
strands involving a single base insertion or deletion. 
Further, genetic analyses have demonstrated that the 

15 mismatch correction process requires intact genes for 
several proteins, including the products of the mutH, mutL 
and mutS genes, as well as DNA helicase II and single- 
stranded DNA binding protein (SSB) . The following are 
further examples of art discussing this subject matter, 

20 Lu et al,, 80 Proc . Natl . Acad . Sci . USA 4639, 

1983 disclose the use of a soluble E. coli system to 
support mismatch correction in vitro . 

Pans et al . , 163 J . Bact . 1007, 1985 disclose 
cloning of the mutS and mutL genes of Salmonella 

25 tvphimurium . 

The specific components of the E. coli mispair 
correction system have been isolated and the biochemical 
functions determined. Preparation of MutS protein 
substantially free of other proteins has been reported (Su 

30 and Modrich, 1986, Proc. Nat. Acad. Sci. U.S.A. , 84, 5057- 
5 061, which is hereby incorporated herein by reference) . 
The isolated MutS protein was shown to recognize four of 
the eight possible mismatched base pairs (specifically, G- 
T, A-C, A-G and C-T mispairs . 

35 Su et al., 263 J. Biol. Chem. 6829, 1988 dis- 

close that the mutS gene product binds to each of the 
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eight base pair mismatches and does so with differential 
efficiency, 

Jiricny et al . , 16 Nucleic Acids Research 7843, 
1988 disclose binding of the mutS gene product of E. coli 
5 to synthetic DNA duplexes containing mismatches to 
correlate recognition of mispairs and efficiency of 
correction in vivo . Nitrocellulose filter binding assays 
and band- shift assays were utilized, 

Welsh et al., 262 J, Biol. Chem. 15624, 1987 

10 purified the product of the MutH gene to near homogeneity 
and demonstrated the MutH gene product to be responsible 
for d(GATC) site recognition and to possess a latent 
endonuclease that incises the unmethylated strand of 
hemimethylated DNA 5' to the G of d(GATC) sequences, 

15 Au et al., 267 J. Biol, Chem. 12142, 1992 

indicate that activation of the MutH endonuclease requires 
MutS, MutL and ATP. 

Grilley et al. 264 J. Biol. Chem. 1000, 1989 
purified the E. coli mutL gene product to near homogeneity 

20 and indicate that the mutL gene product interacts with 
MutS heteroduplex DNA complex. 

Lahue et al., 245 Science 160, 1989 delineate the 
components of the E. coli methyl -directed mismatch repair 
system that function in vitro to correct seven of the 

25 eight possible base pair mismatches. Such a reconstituted 
system consists of MutH, MutL, and MutS proteins, DNA 
helicase II, single-strand DNA binding protein, DNA 
polymerase III holoenzyme, exonuclease I, DNA ligase, ATP, 
and the four deoxyribonucleoside triphosphates. 

30 Su et al., 31 Genome .104, 1989 indicate that 

under conditions of restricted DNA synthesis, or limiting 
concentration of dNTPs, or by supplementing a reaction 
with a ddNTP, there is the formation of excision tracts 
consisting of single -stranded gaps in the region of the 

3 5 molecule containing a mismatch and a d(GATC) site. 
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Grilley et al. 268 J. Biol. Chem. 11830, 1993, 
indicate that excision tracts span the shorter distance 
between a mismatch and the d(GATC) site, indicating a 
bidirectional capacity of the methyl -directed system. 
5 Holmes et al . , 87 Proc . Natl . Acad . Sci . USA , 

583 7, 1990, disclose nuclear extracts derived from Hela 
and Drosophila melanogaster Kc cell lines to support strand 
mismatch correction in vitro . 

Cooper et al., 268 J. Biol . Chem. . 11823, 1993, 

10 describe a role for RecJ and Exonuclease VII as a 5' to 3 ' 
exonuclease in a mismatch repair reaction. In 
reconstituted systems such a 5' to 3' exonuclease function 
had been provided by certain preparations of DNA polymer- 
ase III holoenzyme. 

15 Au et al., 86 Proc. Natl. Acad. Sci. USA 8877, 

198 9 describe purification of the MutY gene product of E. 
coli to near homogeneity, and state that the MutY protein 
is a DNA glycosylase that hydrolyzes the glycosyl bond 
linking a mispaired adenine (G-A) to deoxyribose. The 

20 MutY protein, an apurinic endonuclease, DNA polymerase I, 
and DNA ligase were shown to reconstitute G-A to G-C 
mismatch correction in vitro . 

A role for the E. coli mismatch repair system in 
controlling recombination between related but non allelic 

25 sequences has been indicated (Feinstein and Low, 113 
Genetics 13, 1986; Rayssiguier, 342 Nature 396, 1989; 
Shen, 218 Mol . Gen . Genetics 358, 1989; Petit, 129 
Genetics 32 7, 1991) . The frequency of crossovers between 
sequences which differ by a few percent or more at the 

3 0 base pair level are rare. In bacterial mutants deficient 
in methyl -directed mismatch repair, the frequency of such 
events increases dramatically. The largest increases are 
observed in MutS and MutL deficient strains. (Rayssiguier, 
supra : and Petit, supra . ) 

35 Nelson et al . , 4 Nature Genet ics 11, 1993, 

disclose a genomic mismatch (GMS) method for genetic 
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linkage analysis. The method allows DNA fragments from 
regions of identity-by-descent between two relatives to be 
isolated based on their ability to form mismatch- free 
hybrid molecules. 
5 The method consists of digesting DNA from the two 

sources with a restriction endonuclease that produces 
protruding 3' ends. The protruding 3' ends provide some 
protection from exonuclease III, which is used in later 
steps. The two sources are distinguished by methylating 
10 the DNA from only one source. Molecules from both sources 
are denatured and reannealed, resulting in the formation 
of four types of duplex molecules: homohybrids formed 
from strands derived from the same source and 
heterohybrids consisting of DNA strands from different 
15 sources. Heterohybrids can either be mismatch free or 
contain base-pair mismatches, depending on the extent of 
identity of homologous resins. 

Homohybrids are distinguished from heterohybrids 
by use of restriction endonucleases that cleave at fully 
20 methylated or unmethylated GATC sites. Homohybrids are 
cleaved to smaller duplex molecules, while heterohybrid 
are resistant to cleavage. Heterohybrids containing a 
mismatch (es) are distinguished from mismatch free, mole- 
cules by use of the coli methyl -directed mismatch 
25 repair system. The combination of three proteins of the 
methyl -directed mismatch repair system MutH, MutL, and 
MutS along with ATP introduce a single-strand nick on the 
unmethylated strand at GATC sites in duplexes that contain 
a mismatch. Heterohybrids that do not contain a mismatch 
3 0 are not nicked. All molecules are then subject to diges- 
tion by Exonuclease III (Exo III), which can initiate 
digestion at a nick, a blunt end or a 5' overhang, to 
produce single -stranded gaps. Only mismatch free 
heterohybrids are not subject to attack by Exo III, all 
35 other molecules have single-stranded gaps introduced by 
the enzyme. Molecules with single-stranded regions are 
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removed by absorption to benzoylated napthoylated DEAE 
cellulose. The remaining molecules consist of mismatch- 
free heterohybrids which may represent regions of identity 
by decent . 



mispair recognition protein can form specific complexes 
with any of the eight possible mismatched base pairs 
embedded in an otherwise homologous DNA duplex. It has 

10 also been revealed that another mispair recognition 
protein can recognize primarily one specific base pair 
mismatch, A-G, and in so doing, it chemically modifies a 
nucleotide at the site of the mispair. In addition, 
defined in vitro systems have been established for 

15 carrying out methyl -directed mismatch repair processes. 
Accordingly, the present invention features the use of 
such mispair recognition proteins and related correction 
system components to detect and to localize point 
mutations in DNAs . In addition the invention concerns 

20 methods for the analysis and manipulation of populations 
of DNA duplex molecules potentially containing base pair 
mismatches through the use of all or part of defined 
mismatch repair systems. 



25 heteroduplex mapping analysis, and manipulation: (i) 
binding of a mismatch recognition protein, e.g. . MutS to 
DNA molecules containing one or more mispairs; (ii) 
cleavage of a heteroduplex in the vicinity of a mismatch 
by a modified form of a mismatch recognition protein; 

3 0 (iii) mismatch-provoked cleavage at one or more GATC sites 
via a mismatch repair system dependent reaction, e.g. , 
MutHLS; (iv) formation of a mismatch-provoked gap in 
heteroduplex DNA via reactions of a mismatch repair system 
and (v) labelling of mismatch- containing nucleotides with 

35 a nucleotide analog, e.g. , a biotinylated nucleotide, 
using a complete mismatch repair system. 



5 



SUMMARY OF THE INVENTION 



Applicant has determined that a single DNA base 



The invention utilizes five basic methods for 



BNSDOC1D: <WO 9512688A1_L> 



WO 95/12688 



PCT/US94/12501 



13 

For clarity in the following discussion, it 
should be noted that certain distinctions exist related to 
the fact that some proteins that recognize DNA base 
mispairs are merely DNA binding proteins, while others 
5 modify the DNA as a consequence of mispair recognition. 
Notwithstanding the fact that in the latter situation the 
protein modifying the DNA may be associated with the DNA 
only transiently, hereinafter, whether a mispair 
recognition protein is capable of DNA binding only or also 

10 of modifying DNA, whenever it is said that a protein 
recognizes a DNA mispair, this is equivalent to saying 
that it "forms specific complexes with" or "binds 
specifically to" that DNA mispair in double- stranded DNA. 
In the absence of express reference to modification of 

15 DNA, reference to DNA mispair recognition does not imply 
consequent modification of the DNA. Further, the phrase 
"directs modification of DNA" includes both cases wherein 
a DNA mispair recognition protein has an inherent DNA 
modification function ( e.g. . a glycosylase) and cases 

20 wherein the mispair recognition protein merely forms 
specific complexes with mispairs, which complexes are then 
recognized by other proteins that modify the DNA in the 
vicinity of the complex. 

Accordingly, the present invention features a 

25 method for detecting base pair mismatches in a DNA duplex 
by utilizing a mismatch recognition protein that forms 
specific complexes with mispairs, and detecting the 
resulting DNA: protein complexes by a suitable analytical 
method . 

3 0 In addition to methods designed merely to detect 

base pair mismatches, this invention includes methods for 
both detecting and localizing base pair mismatches by 
utilizing components of mismatch repair system. 

The present invention also features mispair 

3 5 recognition proteins which have been altered to provide an 
inherent means for modifying at least one strand of the 
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DNA duplex in the vicinity of the bound mispair 
recognition protein. 

The present invention also concerns systems 
utilizing an A-G specific mispair recognition protein, for 
5 example, the E. coli DNA mispair recognition protein that 
recognizes only A-G mispairs without any apparent 
requirement for hemimethylation. This protein, the 
product of the jnutY gene, is a glycosylase which 
specifically removes the adenine from an A-G mispair in a 

10 DNA duplex. Accordingly, this MutY protein is useful for 
the specific detection of A-G mispairs according to the 
practice of the present invention. 

The invention also includes the combined use of 
components of a mismatch repair system along with a 

15 recombinase protein. The recombinase protein functions to 
catalyze the formation of duplex molecules starting with 
single-stranded molecules obtained from different sources, 
by a renaturation reaction. Such a recombinase protein is 
also capable of catalyzing a strand transfer reaction 

20 between a single -stranded molecule from one source and 
double -stranded molecules obtained from a different 
source. In the presence of a base pair mismatch, 
formation of duplex regions catalyzed by such a 
recombinase protein is inhibited by components of a mis- 

25 match repair system, e.g. , E . coli MutS and MutL, 
proteins. Modulation of recombinase activity by 

components of a mismatch repair system may involve 
inhibition of branch migration through regions that 
generate mismatched base pairs . The combination of a DNA 

30 mismatch repair system and a recombinase system provides 
a very sensitive selection step allowing for the removal 
of molecules containing a base pair mismatch from a 
population of newly formed heteroduplex molecules. This 
procedure provides a selection scheme that can be utilized 

35 independent of or in conjunction with the actual mismatch 
repair reaction. 
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The invention also features two improvements on 
the genomic mismatch scanning technique (GMS) of Nelson et 
al. 4 Nature Genetics 11, 1993, used to map regions of 
genetic identity between populations of DNA molecules. 
5 One improvement provided by the invention 

features an additional selection step, as described above, 
for determining genetic variation. The genomic mismatch 
scanning (GMS) method includes one selection step which is 
carried out after hybrid formation. The present invention 

10 includes an additional step that occurs during hybrid 
formation, through the use of a protein with recombinase 
activity along with components of a mismatch repair 
system. The increase in sensitivity for screening for 
genetic variation provided by the additional selection 

15 step makes possible the use of the GMS technique with 
larger genomes, e.g. , man. 

A second improvement provided by the invention 
features the replacement or modification of the exonu- 
clease III digestion step employed in the GMS method. In 

20 the GMS procedure exonuclease III is used to degrade all 
DNA molecules, except mismatch- free heterohybrids, to 
molecules containing single -stranded regions, which are 
subsequently removed. Heterohybrids are duplex molecules 
which are formed in the method from two molecules which 

25 were previously base paired with other molecules ( i.e. . 
from different sources) . In the instant invention this 
step is replaced by a procedure that employs all or some 
of the components of a mismatch repair system. Exo III is 
a 3' to 5' exonuclease specific for double -stranded DNA, 

3 0 which preferably initiates at blunt or 5' protruding ends. 
In the GMS procedure DNA molecules are digested with 
restriction enzymes that produce protruding 3 ' ends . 
Although molecules containing protruding 3 ' ends are not 
preferred substrates for Exo III, such molecules can be 

35 subject to limited attack by the enzyme. Thus, even 
mismatch- free heterohybrids will be degraded to some 



BNSDOCID: <WO 9512688A1_I_> 



WO 95/12688 PCT/US94/ 12501 



extent by Exo III, and will be erroneously removed from 
the final population of molecules representing those of 
identity-by-descent. The invention employs components of 
a mismatch repair system along with dideoxy or 
5 biotinylated nucleotide, to avoid the use of Exo III and 
the potential loss of heterohybrids molecules that are 
mismatch-f ree . Homohybrids are digested in the presence 
of helicase II by exoVI Rec J and exo I, e.g. , natural 
exonucleases involved in the mismatch repair reaction. 

10 The invention also features a modification of the step 
utilizing Exo III, consisting of ligation of duplex DNA 
molecules at dilute concentrations so as to form closed 
circular monomer molecules, thus removing any 3' ends 
which may be subject to degradation by Exo III. 

15 The invention includes the use of a mismatch 

repair system to detect and remove or correct base pair 
mismatches in a population produced by the process of 
enzymatic amplification of nucleic acid molecules. DNA 
polymerase errors that occur during a cycle of enzymatic 

20 amplification can result in the presence of mismatched 
base pair(s) in the population of product molecules. If 
such errors are perpetuated in ' subsequent cycles they can 
impair the value of the final amplified product. The 
fidelity of the amplification method can be enhanced by 

25 including one or more components of a mismatch repair 
system to either correct the mismatch base pair(s) or to 
eliminate from the amplified population, molecules that 
contain mismatch base pair(s). Elimination of molecules 
containing a base pair mismatch can be accomplished by 

3 0 binding to a protein, such as MutS, or by introduction of 
a nick in one strand of the duplex so that a full sized 
product will not be produced in a subsequent round of 
amplification. 

The invention also features methods to remove 

3 5 molecules containing a base pair mismatch through the 
binding of the mismatch to the components of the mismatch 
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repair system or by the binding of a complex of a mismatch 
and components of a mismatch repair system to other 
cellular proteins. Another aspect of the invention for 
removal of molecules containing a mismatch is through the 
5 incorporation of biotin into such a molecule and 
subsequent removal by binding to avidin. 

Another aspect of the invention features use of 
a mismatch repair system which has a defined 5' to 3 ' 
exonuclease function, that is provided by the exonuclease 
10 VII or RecJ exonuclease. In other systems a 5' to 3' 
exonuclease function is provided by exonuclease VII which 
is present in many preparations of the DNA polymerase III 
holoenzyme . 

The invention also includes kits having 
15 components necessary to carry out the methods of the 
invention. 

The mismatch repair systems of the instant 
invention, e.g. , E. coli . offer specific and efficient 
procedures for detection and localization of mismatches 

2 0 and manipulation of DNA containing mismatches that is a 

reflection of their biological function. All eight 
possible base pair mismatches are recognized and seven of 
the eight mismatches are processed and corrected by the 
system. Although C-C mismatches are not a substrate for 
25 repair, MutS does bind weakly to this mispair permitting 
its detection. In contrast to the electrophoretic 
migration procedure, the RNase method, or chemical 
modification procedures, the system does not depend on the 
destabilization of the DNA helix for detection of 

3 0 mismatches or binding to mismatches. The system features 

exquisite specificity, and is not subject to non-specific 
interactions with bases at the ends of linear DNA 
fragments or non-specific interactions at non-mismatch 
sites in long molecules. 
35 The detection of fragments containing a mispair 

is limited only by the intrinsic specificity of the 
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system, for example, detection of better than one G-T 
mispair per 300 kilobases. Mismatches have been routinely 
detected with a 6,4 00 base pair substrate and the system 
should be applicable to molecules as large as 40-50 kb. 
5 This allows for detection of possible single base 
differences between long DNA sequences, for example, 
between a complete gene from one individual and the entire 
genome of another. The invention also enables the 
localization of any possible single base difference within 
10 the sequences of homologous regions of long DNA molecules 
such as those encoding one or more complete genes and 
comprising several kbp of DNA, 

Several of the methods of the invention result in 
the covalent alteration of the phosphodiester backbone of 
15 DNA molecules. This covalent alteration facilitates 
analysis of the product DNA molecules especially by 
electrophoretic methods. 

Other features and advantages of the invention 
will be apparent from the following description of the 
20 preferred embodiments thereof, and from the claims. 

BRIEF DESCRIPTION OF THE FIGURES 
Fig. 1. Heteroduplex substrate for in vitro 
mismatch correction. The substrate used in some examples 
is a 644 0 -bp, covalently closed, circular heteroduplex 
that is derived from bacteriophage fl and contains a 
single base -base mismatch located within overlapping 
recognition sites for two restriction endonucleases at 
position 5632. In the example shown a G-T mismatch 
resides within overlapping sequences recognized by Hind 
III and Xho I endonucleases. Although the presence of the 
mispair renders this site resistant to cleavage by either 
endonuclease, repair occurring on the complementary (c) 
DNA strand yields an A-T base pair and generates a Hind 
Ill-sensitive site, while correction on the viral (v) 
strand results in a G-C pair and Xho I -sensitivity. The 
heteroduplexes also contain a single d(GATC) sequence 1024 



30 
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base pairs from the mismatch (shorter path) at position 
216. The state of strand methylation at this site can be 
controlled, thus permitting evaluation of the effect of 
DNA methylation on the strand specificity of correction. 
5 Fig. 2. Requirement for DNA ligase in mismatch 

correction. Hemimethylated G-T heteroduplex DNA (Fig. 1, 
0.6 /zg, d(GATC) methylation on the complementary DNA 
strand] was subjected to mismatch repair under 
reconstituted conditions in a 60 fil reaction (Table 3, 
10 closed circular heteroduplex), or in 20 pi reactions (0.2 
lig of DNA) lacking MutS protein or ligase, or lacking both 
activities. A portion of each reaction (0.1 /xg of DNA) 
was treated with EDTA (10 mM final concentration) and 
subjected to agarose gel electrophoresis in the presence 
15 of ethidium bromide (1.5 /ig/ml; top panel, lanes 1-4). 
Positions are indicated for the unreacted, supercoiled 
substrate (SC) , open circles containing a strand break 
(OC) and covalently closed, relaxed circular molecules 
(RC) . A second sample of each reaction containing 0,1 /ig 

2 0 of DNA was hydrolyzed with Xho I and Cla I endonucleases 

(Fig. 1) to score G-T to G-C mismatch correction and 
subjected to electrophoresis in parallel with the samples 
described above (bottom panel, lanes 5-8) . The remainder 
of the complete reaction (0.4 fig DNA, corresponding to the 
25 sample analyzed in lane 1) was made 10 mM in EDTA, and 
subjected to electrophoresis as described above, A gel 
slice containing closed circular, relaxed molecules was 
excised and the DNA eluted. This sample was cleaved with 
Xho I and Cla I and the products analyzed by 

3 0 electrophoresis (lane 9) . 

Fig. 3. Methyl -direction of mismatch correction 
in the purified system. Repair reactions with the G-T 
heteroduplex (Fig. 1) were performed as described in Table 
3. (closed circular heteroduplex) except that reaction 
35 volumes were 20 \iZ (0.2 /xg of DNA) and the incubation 
period was 60 minutes. The reactions were heated to 55° 
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for 10 minutes and each was divided into two portions to 
test strand specificity of repair. G-T to A-T mismatch 
correction, in which repair occurred on the complementary 
(c) DNA strand, was scored by cleavage with Hind III and 
5 Cla I endonucleases, while hydrolysis with Xho I and Cla 
I were used to detect G-T to G-C repair occurring on the 
viral (v) strand. Apart from the samples shown in the 
left two lanes, all heteroduplexes were identical except 
for the state of methylation of the single d(GATC) 

10 sequence at position 216 (Pig* 1) . The state of 
modification of the two DNA strands at this site is 
indicated by + and - notation. The G-T heteroduplex used 
in the experiment shown in the left two lanes (designated 
0/0) contains the sequence d(GATT) instead of d(GATC) at 

15 position 216, but is otherwise identical in sequence to 
the other substrates. 

Fig. 4. Strand- specif ic repair of heteroduplexes 
containing a single strand scission in the absence of MutH 
protein. Hemimethylated G-T heteroduplex DNAs (Fig. 1, 5 

2 0 fig) bearing d (GATC) modification on the viral or 

complementary strand were subjected to site-specific 
cleavage with near homogeneous * MutH protein. Because the 
MutH-associated endonuclease is extremely weak in the 
absence of other mismatch repair proteins, cleavage at 

25 d (GATC) sites by the purified protein requires a MutH 
concentration 80 times that used in reconstitution 
reactions. After removal of MutH by phenol extraction, 
DNA was ethanol precipitated, collected by centrif ugation, 
dried under vacuum, and resuspended in 10 mM Tris-HCl (pH 

30 7.6), 1 mM EDTA. Mismatch correction of MutH-incised and 
covalently closed, control heteroduplexes was performed as 
described in the legend to Table 2 except that ligase and 
NAD* were omitted. Outside and inside strands of the 
heteroduplexes depicted here correspond to complementary 

3 5 and viral strands respectively. Values in parentheses 

indicate repair occurring on the methylated, continuous 
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DNA strand. The absence of MutH protein in preparations 
of incised heteroduplexes was confirmed in two ways. 
Preparations of incised molecules were subject to closure 
by DNA ligase (> 80%) demonstrating that MutH protein does 
5 not remain tightly bound to incised d(GATC) sites. 
Further, control experiments in which each MutH- incised 
heteroduplex was mixed with a closed circular substrate 
showed that only the open circular form was repaired if 
MutH protein was omitted from the reaction whereas both 

10 substrates were corrected if MutH protein was present 
(data not shown) . 

Fig. 5. Requirements for MutH protein and a 
d(GATC) sequence for correction in the presence of DNA 
ligase. Hemimethylated G-T heteroduplexes incised on the 

15 unmethylated strand at the d(GATC) sequence were prepared 
as described above in Fig . 4 . A G-T heteroduplex devoid 
of d (GATC) sites (Fig. 4) and containing a single-strand 
break within the complementary DNA strand at the Hinc II 
site (position 1) was constructed as described previously 

2 0 (Lahue et al . supra ) . Mismatch correction assays were 

performed as described in Table 3, with ligase (20 ng in 
the presence of 25 fiM NAD*) and MutH protein (0.26 ng) 
present as indicated. Table entries correspond to 
correction occurring on the incised DNA strand, with 
25 parenthetic values indicating the extent of repair on the 
continuous strand. Although not shown, repair of the 
nicked molecule lacking a d (GATC) sequence (first entry of 
column 3) was reduced more than an order of magnitude upon 
omission of MutL, MutS, SSB or DNA polymerase III 

3 0 holoenzyme . 

Fig. 6 is a diagrammatic representation of the 
model system used to evaluate MutS and MutL effects on 
RecA catalyzed strand transfer. 

Fig. 7 depicts the effects of MutS and MutL on 
3 5 RecA- catalyzed strand transfer between homologous and 
quasi-homologous DNA sequences. Solid lines indicate fd- 
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fd strand transfer, while dashed lines correspond to fd- 
M13 strand transfer. Strand transfer was evaluated in the 
presence of MutL (solid circles) or MutS (solid squares) . 

Fig. 8 depicts The MutL potentiation of MutS 
5 block to strand transfer in response to mismatched base 
pairs. Solid lines: fd-fd strand transfer; dashed lines 
fd-M13 strand transfer; RecA (solid circle) ; RecA and MutL 
(solid diamond) ; RecA and MutS (solid square) ; RecA, MutL, 
and MutS (solid triangle) . 

10 Fig, 9 depicts the MutS and MutL block of branch 

migration through regions that generate mismatched base 
pairs. Solid lines: M13-M13 strand transfer; dashed line 
fd-M13 strand transfer. RecA only (solid circle and 
square) ; RecA, MutS, and MutL (open circle and square) . 

15 DESCRIPTION OF PREFERRED EMBODIMENTS 

The invention consists of methods utilizing and 
kits consisting of components of mismatch repair system to 
detect, and localize DNA base pair mismatches and 
manipulate molecules containing such mismatches . The 

2 0 invention also features modified mispair recognition 

proteins and their utilization in the above-mentioned 
methods and kits. The invention also includes methods 
and kits comprising components of a mismatch repair system 
along with proteins with recombinase activity. The 

25 invention also consists of methods to improve the GMS 
technique to detect regions of homology-by- descent . 
Methods for detecting the presence and localization of 
mismatched base pairs by complex formation with a mismatch 
recognition protein 

30 One embodiment of the invention features a 

diagnostic method for detecting a base pair mismatch in a 
DNA duplex. The method comprises the steps of contacting 
at least one strand of a first DNA molecule with the 
complementary strand of a second DNA molecule under 

3 5 conditions such that base pairing occurs, contacting a DNA 

duplex potentially containing a base pair mismatch v;dth a 
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mispair recognition protein under conditions suitable for 
the protein to form a specific complex only with the DNA 
duplex having a base pair mismatch , and not with a DNA 
duplex lacking a base pair mismatch, and detecting the 
5 complex as a measure of the presence of a base pair 
mismatch in the DNA duplex. 

By "mismatch" is meant an incorrect pairing 
between the bases of two nucleotides located on 
complementary strands of DNA, i.e. . bases pairs that are 

10 not A:T or G:C. 

In the practice of this method, the two DNA' s or 
two DNA samples to be compared may comprise natural or 
synthetic sequences encoding up to the entire genome of an 
organism, including man, which can be prepared by well 

15 known procedures. Detection of base sequence differences 
according to this method of this invention does not 
require cleavage (by a restriction nuclease, for example) 
of either of the two DNAs, although it is well known in 
the art that rate of base pair formation between 

2 0 complementary single -stranded DNA fragments is inversely 

related to their size. This detection method requires 
that base sequence differences, to be detected as base 
pair mismatches lie within a region of homology 
constituting at least about 14 consecutive base pairs of 
25 homology between the two DNA molecules, which is about the 
minimum number of base pairs generally required to form a 
stable DNA duplex. Either one or both of the strands of 
the first DNA may be selected for examination, while at 
least one strand of the second DNA complementary to a 

3 0 selected first DNA strand must be used. The DNA strands, 

particularly those of the second DNA, advantageously may 
be radioactively labeled to facilitate direct detection, 
according to procedures well known in the art . 

By "mispair recognition protein" is meant a 
3 5 protein of a mismatch repair system that specifically 
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recognizes and binds to a base pair mismatch, e.g. . E. 
coli MutS. 

Methods and conditions for contacting the DNA 
strands of the two DNAs under conditions such that base 
5 pairing occurs are also widely known in the art. 

In preferred embodiments of this aspect of this 
invention, the mispair recognition protein is the product 
of the mutS gene of E. coli. or species variations 
thereof, or portions thereof encoding the recognition 

10 domain. The protein recognizes all eight possible base 
pair mismatches, detection of the DNA: protein complex 
comprises contacting the complexes with a selectively- 
absorbent agent under conditions such that the protein: DNA 
complexes are retained on the agent while DNA not 

15 complexed with protein is not retained and measuring the 
amount of DNA in the retained complexes, the absorbent 
agent is a membranous nitrocellulose filter, detection of 
the DNA: protein complex further includes the step wherein 
an antibody specific for the base mispair recognition 

2 0 protein is employed, the base mispair recognition protein 

is the product of the mutS gene of S. tvohimurium the hexA 
gene of pneumoniae or the MSH1 and MSH2 genes of yeast, 
and wherein the step for detecting the DNA: protein complex 
further includes a step wherein the electrophoretic 
25 mobility of the DNA:protein complex is compared to 
uncomplexed DNA, 

The ability of the MutS protein to recognize 
examples of all eight single base pair mismatches within 
double -stranded DNA, even including C-C mispairs which do 

3 0 not appear to be corrected in vivo , is demonstrated by the 

fact that MutS protein protects DNA regions containing 
each mismatch from hydrolysis by DNase I (i.e., by "Dnase 
I footprint" analyses), as recently reported (Su, S.-S., 
et al., 1988, J. Biol. Chem. . 263, 6829-6835). The 
35 affinity of MutS protein for the different mispairs that 
have been tested varies considerably. Local sequence 
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environment may also affect the affinity of the MutS 
protein for any given base mispair; in other words, for 
example, the affinity for two specific cases of A-C 
mispairs, which are surrounded by different sequences, may 
5 not be the same. Nevertheless, no examples of base 
mispairs have been found that are not recognized by 
isolated MutS protein. Accordingly, this method of the 
invention detects all mismatched base pairs. 

It should be particularly noted that the DNA 

10 duplexes which MutS recognizes are not required to contain 
GATC sequences and, hence, they do not require 
hemimethylation of A's in GATC sequences, the specific 
signal for the full process of methyl -directed mispair 
correction in vivo ; therefore, use of MutS in this method 

15 allows recognition of a DNA base mispair in DNAs lacking 
such methylation, for instance, DNAs isolated from human 
tissues . 

By "species variation" is meant a protein which 
appears to be functionally and in part, at least, 

2 0 structurally homologous to the E. coli MutS protein. One 

example of such a protein has also been discovered in a 
methyl -directed mispair correction system in Salmonella 
tvphimurium bacteria (Pang et al . , 1985, J. Bacterid., 
163, 1007-1015) . The gene for this protein has been shown 

25 to complement E. coli strains with mutations inactivating 
the mutS gene and the amino acid sequence of its product 
shows homology with that of the E. coli MutS protein. 
Accordingly, this S. tvphimurium protein is also suitable 
for the practice of this aspect of the present invention. 

30 Other organisms, including man, are known to possess 
various systems for recognition and repair of DNA 
mispairs, which, as one skilled in the art would 
appreciate, comprise mispair recognition proteins 
functionally homologous to the MutS protein. Nuclear 

3 5 extracts derived from Hela and Drosophila melanogaster K c 

cell lines has been shown to support efficient strand- 
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specific mismatch correction in vitro (Holmes et al., 
1990, Proc. Natl. Acad. Sci . USA 87, 5837-5841, which is 
incorporated herein by reference) , and this reaction has 
been shown to occur by a mechanism similar to that of the 
5 bacterial reaction (Fany and Modrich 268 J, Biol . Chem . 
11838, 1993) . Furthermore, genes encoding proteins that 
are homologous to bacterial MutS at the amino acid 
sequence level have been demonstrated in human (Fujii and 
Shimada 264 J. Biol. Chem. 10057, 1989) and yeast (Reenan 
10 and Kolodner 132 Genetics 963, 1992) and S. pneumoniae 
(Priebe et al, 170 J. Bacterid. 190, 1988) . Accordingly, 
it is believed that such DNA base mispair recognition 
proteins may also be suitable for use in the present 
invention. 

15 By "protein encoding the recognition domain" is 

meant a region of the mispair recognition protein which is 
involved in mispair recognition and binding. Such a 
domain comprises less than the complete mispair 
recognition protein. 

20 By a "selectively adsorbent agent" is meant any 

solid substrate to which protein: DNA complexes are 
retained on the agent while DNA not complexed with protein 
is not retained, such agents are known to those skilled in 
the art. Absent radioactive labeling of at least one 

2 5 strand used to form the DNA duplexes, the DNA in complexes 

on the filter may be detected by any of the usual means in 
the art for detection of DNA on a solid substrate, 
including annealing with complementary strands of 
radioactive DNA. 
30 The nitrocellulose filter method for detecting 

complexes of MutS protein with base mispairs in DNA has 
been reported in detail (Jiricny, J. et al., 1988, Nuc. 
Acids Res. 16, 7843-7853, which is hereby incorporated 
herein by reference) . Besides simplicity, a major 

3 5 advantage of this method for detecting the DNA; protein 

complex over other suitable methods is the practical lack 
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of a limitation on the size of DNA molecules that can be 
detected in DNA: protein duplexes. Therefore, this 

embodiment of this method is in principle useful for 
detecting single base sequence differences between DNA 
5 fragments as large as can be practically handled without 
shearing . 

By "electrophoretic mobility" is meant a method 
of separating the DNA: protein complexes from DNA that does 
not form such complexes on the basis of migration in a gel 

10 medium under the influence of an electric field. 
DNA: protein complexes are less mobile than naked DNA. 
Such methods based on electrophoretic mobility are known 
to those skilled in the art. The DNA in the DNA:protein 
complexes may be detected by any of the usual standard 

15 means for detection of DNA in gel electrophoresis, 
including staining with dyes or annealing with 
complementary strands of radioactive DNA. Detecting 
complexes comprising the MutS base mispair recognition 
protein and mispairs in DNA duplexes is also described in 

20 the foregoing reference (Jiricny, J. et al., 1988, Nuc . 
Acids Res . , 16, 7843-7853) . Under the usual conditions 
employed in the art for detecting specific DNA: protein 
complexes by gel electrophoresis, complex formation of a 
protein with a double -stranded DNA fragment of up to 

25 several hundred base pairs is known to produce 
distinguishable mobility differences. 

Antibodies specific for a DNA mispair recognition 
protein can be prepared by standard immunological 
techniques known to those skilled in the art. 

3 0 Other suitable analytical methods for detecting 

the DNA protein complex include immunodetection methods 
using an antibody specific for the base mispair 
recognition protein. For example, antibodies specific for 
the E. coli MutS protein have been prepared. Accordingly, 

3 5 one immunodetection method for complexes of MutS protein 
with DNA comprises the steps of separating the DNA: protein 
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complexes from DNA that does not form such complexes by 
immunoprecipitation with an antibody specific for MutS 
protein, and detecting the DNA in the precipitate. 
According to the practice of this aspect of the invention, 
5 quantitative immunoassay methods known in the art may be 
employed to determine the number of single base mispairs 
in homologous regions of two DNA molecules, based upon 
calibration curves that can be established using complexes 
of a given mispair recognition protein with DNA duplexes 
10 having known numbers of mispairs. 

Another aspect of the invention features a 
method for detecting and localizing a base pair mismatch 
in a DNA duplex. The method includes contacting at least 
one strand of the first DNA molecule with the 
complementary strand of the second DNA molecule under 
conditions such that base pairing occurs, contacting the 
resulting double -stranded DNA duplexes with a mispair 
recognition protein under conditions such that the protein 
forms specific complexes with mispairs, subjecting the 
duplex molecules to hydrolysis with an exonuclease under 
conditions such that the complex blocks hydrolysis, and 
determining the location of the block to hydrolysis by a 
suitable analytic method. 

"Hydrolysis with an exonuclease" is a procedure 
25 known to those skilled in the art and utilizes enzymes 
possessing double-strand specific exonuclease activity, 
e -q- ' E- coli exonuclease III, RecBCD exonuclease, lambda 
exonuclease, and T7 gene 6 exonuclease. 

By "block to hydrolysis" is meant interference of 
3 0 hydrolysis by the exonuclease. Such protection can result 
from the mispair recognition protein protecting the DNA to 
which it is bound. 

By "suitable analytical method" is meant any 
method that allows detection of the block to exonuclease 
35 digestion, such analysis of molecules by gel 
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electrophoresis. Such methods are known to those skilled 
in the art. 

Methods for detecting and localizing base pair mismatches 
by mismatch repair system strand modification reactions 
5 in addition to methods that detect base sequence 

differences, this invention provides methods for both 
detecting and localizing a base pair mismatch in a DNA 
duplex. One method includes contacting at least one 
strand of the first DNA molecule with the complementary 

10 strand of the second DNA molecule under conditions such 
that base pairing occurs, contacting the resulting double - 
stranded DNA duplexes with a mismatch recognition protein 
under conditions such that the protein forms specific 
complexes with mispairs and thereby directs modification 

15 of at least one strand of the DNA in the resulting 
DNA: protein complexes in the vicinity of the DNA: protein 
complex, and determination of the location of the 
resulting DNA modification by a suitable analytical 
method. 

20 By "modification" is meant any alteration for 

which there is a means of detection, for instance a 
chemical modification including breaking of a chemical 
bond resulting in, as examples, cleavage between 
nucleotides of at least one DNA strand or removal of a 

25 base from the sugar residue of a nucleotide. Specific 
means for modifying DNAs in the vicinity of the 
DNA: protein complex are provided below for several 
embodiments of this aspect of the invention, together with 
interpretations of the phrase "in the vicinity of", as 

3 0 appropriate to the practical limitations of the 
modification approach in each instance. 

Suitable analytical methods for determining the 
location of the modification are known to those skilled in 
the art. Such a determination involves comparison of the 

3 5 modified DNA molecule with the homologous unmodified DNA 
molecule . 
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In preferred embodiments of this aspect of the 
invention, the mispair recognition protein is the product 
of the mutS gene of £L_ coli or another functionally 
homologous protein; the step in which the DNA is modified 
5 in the vicinity of the DNArprotein complex further 
comprises contacting the DNA: MutS protein complex with a 
defined set or subset of coli DNA mismatch repair 

proteins (comprising EL. coli MutH, MutL, DNA helicase II, 
single- stranded DNA binding protein, DNA polymerase III 

10 holoenzyme, exonuclease I, and exonuclease VII (or RecJ 
exonuclease) , or species variations of these activities) , 
ATP and one -or more dideoxynucleoside-5 ' -triphosphates or 
in the absence of exogenous deoxyribonucleoside-5 ' - 
triphosphate under conditions that produce a discontinuity 

15 in one or both strands of the DNA duplex in the vicinity 
of the mismatch. 

DNA used in such an analysis are to be 
unmethylated or hemimethylated at on the 6 -position of the 
adenine base in GATC sequences. With the exception of 

2 0 DNAs from some bacterial species, the chromosomes of most 

organisms naturally lack this modification. In those 
cases where hemimethylation of otherwise GATC unmodified 
molecules is desired, this can be accomplished by use of 
E . coli Dam methylase as is well known in the art. 
25 Symmetrically methylated DNA prepared by use of this 
enzyme is denatured and subsequently reannealed with 
single -stranded sequences representing an homologous (or 
largely so) DNA. If necessary, hemimodified molecules 
produced by this renaturation procedure can be separated 

3 0 from unmethylated is symmetrically methylated duplexes 

which can also result from the annealing procedure. As is 
well known in the art, this can be accomplished by 
subjecting annealed products to cleavage by Dpnl and Mbol 
endonucleases . The former activity cleaves symmetrically 
35 methylated duplex DNA at GATC sites while unmodified 
duplex DNA is subject to double strand cleavage only at 
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unmodified GATC sites by the latter activity. Since 
hemimodif ied DNA is resistant to double strand cleavage by 
both Dpnl and Mbol, desired hemimethylated products can be 
separated on the basis of size from the smaller fragments 
5 produces by Dpnl and Mbol cleavage, for example by 
electrophoretic methods. 

By "discontinuity in one or both strands of the 
DNA duplex" is meant a region which consists of a break in 
the phosphodiester backbone in one or both strands, or a 

10 single -stranded gap in a duplex molecule. 

One aspect of this preferred embodiment involves 
contacting the DNA:MutS protein complex with coli MutL 
and MutH proteins (or species variations thereof) in the 
presence of ATP and an appropriate divalent cation 

15 cof actor (eg., Mg 2+ ) so that mismatch- containing molecules 
will be subject to incision at one or more GATC sites in 
the vicinity of the mispair. Such incision events can be 
monitored by a suitable analytic method for size detection 
such as electrophoresis under denaturing condition. 

2 0 A second aspect of this preferred embodiment 

involves contacting the DNA:MutS complex with a defined E . 
coli mismatch correction system consisting of coli 
MutH, MutL, DNA helicase II, single- stranded DNA binding 
protein, DNA polymerase III holoenzyme, exonuclease I, and 
25 exonuclease VII (or RecJ exonuclease) , or species variants 
of these activities, ATP in the absence of exogenous 
deoxyribonucleoside-5' -triphosphates or in the presence of 
one or more dideoxynucleoside-5' -triphosphates such that 
single -stranded gaps are produced in the vicinity of the 

3 0 complexed protein; the method for determining the location 

of the single -stranded gaps with the DNA duplex further 
includes analysis of electrophoretic mobility of treated 
samples under denaturing conditions of the steps of 
cleaving the DNA with a single -stranded specific 
3 5 endonuclease , and comparing the electrophoretic mobilities 
of the cleaved fragments with unmodified DNA fragments 
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under non- denaturing conditions; the step for modifying 
the DNA duplex in the vicinity of the complexed protein 
comprises contacting the complexes with proteins of a 
mismatch repair system, ATP and a divalent cation under 
5 conditions such that an endonucleolytic incision is 
introduced at one or more GATC sequences in the duplex 
molecule . 



correction system comprises the following purified 
10 components: E_i. coli MutH, MutL, and MutS proteins, DNA 
helicase II, single -stranded DNA binding protein, DNA 
polymerase III holoenzyme, exonuclease I, DNA ligase, ATP, 
and the four deoxynucleoside-5 ' -triphosphates . This set 
of proteins can process seven of the eight base -base 
15 mismatches in a strand-specific reaction that is directed 
by the state of methylation of a single GATC sequence 
located 1 kilobase from the mispair. This defined system 
is described further in Example 1, below. The 5' to 3' 
exonuclease function can either be supplied by either DNA 
20 polymerase III holoenzyme preparations that contain this 
activity or as a separate defined component consisting of 
exonuclease VII or RecJ exonuclease. It should be noted 
that the lack of ability to repair O C base mispairs in 
this embodiment of this aspect of the present invention is 

2 5 not a major limitation of the method for detecting all 

possible base sequence differences between any two 
naturally occurring DNA sequences because mutations that 
would give rise to a OC mispair upon hybridization would 
also give rise to a G-G mismatch when the complementary 

3 0 strands are hybridized. 



gaps in the vicinity of the DNA: MutS protein complexes, 
DNA duplexes containing mispaired base pairs are contacted 
with the defined mismatch correction system under the 
3 5 standard conditions described in Example 1, Table 3 
(Complete reaction) , except for the following differences: 



An example of a complete defined mismatch 



For the purpose of generating single -stranded 
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(i) exogenous dNTPs are omitted; or (ii) 2', 3'- 
dideoxynucleoside -5 ' -triphosphates (ddNTPs) at suitable 
concentrations (10 to 100 fiM) are substituted for dNTPs; 
or (iii) reactions containing dNTPS are supplemented with 
ddNPPs at a suitable concentration to yield a chain 
termination frequency sufficient to inhibit repair of 
single-strand gaps. In cases (i)-(iii) DNA ligase may be 
omitted from the reaction. In cases (ii) and (iii) all 
four ddNTPs may be present; however, it is expected that 
the presence of one, two, or three ddNTPs will prove 
sufficient to stabilize single strand gaps via chain 
termination events. While it is expected that most 
applications of these gap forming protocols will utilize 
MutH, it is pertinent to note that the requirement of 
methyl -direct strand incision by MutH may be obviated by 
provision of a single-strand nick by some other means 
within the vicinity of the mispair, as described in 
Example 1, Figure 5. A suitable means for inducing such 
nicks in DNA is limited contact with a nuclease, Dnase I, 
for example; under conditions that are well known in the 
art, this approach creates nicks randomly throughout 
double -stranded DNA molecules "at suitable intervals for 
allowing the mispair correction system to create single- 
stranded gaps in the vicinity of a mispair anywhere in the 
DNA. 

It should be noted that in this embodiment of 
this method for localizing mismatch base pairs, "in the 
vicinity of" a base mispair is defined practically by the 
size of the single -strand gaps typically observed under 
above conditions, namely up to about one kbp from the 
mismatched base pair . 

By "determining the location of the single - 
stranded gaps within the DNA duplex" entails the steps of: 
(i) Cleaving the DNA with at least one restriction 
endonuclease (either prior or subsequent to contact of the 
preparation with mismatch repair activities) followed by 



WO 95/12688 PCT/US94/I2501 



comparison of electrophoretic mobilities under denaturing 
conditions of the resulting modified DNA fragments with 
DNA restriction fragments not contacted with the defined 
mismatch correction system; or (ii) Cleaving the DNA with 
5 at least one restriction endonuclease and with a single- 
strand specific endonuclease, followed by comparison of 
the electrophoretic mobilities under native conditions of 
the resulting modified DNA fragments with DNA restriction 
fragments not contacted with the defined mismatch 

10 correction system. Suitable single-strand specific 
endonucleases include the SI single- stranded specific 
nuclease, for example, or other functionally similar 
nucleases well known in the art. In the cases of either 
(i) of (ii), additional restriction mapping may be 

15 performed as needed to further localize any fragment 
modifications observed in initial application of the 
method, until, if desired, a restriction fragment of 
convenient size for direct sequence determination is 
obtained for direct comparisons of sequences of the two 

20 DNA molecules in the vicinity of the base sequence 
difference . 

By "proteins of a mismatch repair system" are 
meant a protein that contains a GATC endonuclease, a 
mispair recognition protein, and proteins that participate 
25 in the activation of the GATC endonuclease. 

By "divalent cation: is meant a cof actor for the 
GATC endonucleases, e.g. . MgCl 2 . 

By "endonucleolytic incision: is meant cleavage 
of a DNA fragment containing a mismatched base pair at 
3 0 unmethylated of hemimethylated GATC sequences in the 
vicinity of a mismatch. 

"Size fractionation by electrophoretic mobility 
under denaturing conditions" is a procedure well know by 
those skilled in the art. Gel Electrophoresis can either 
35 be conventional or pulse-field. 

Modification of mispair recognition proteins and urpr 
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The present invention also includes forms of 
mispair recognition proteins which have been altered to 
provide means for modifying at least one strand of the DNA 
duplex in the vicinity of the bound mispair recognition 
5 protein . 

In preferred embodiments of this aspect of the 
invention, the altered mispair recognition protein is the 
modified product of the mutS gene of E. coli or is another 
functionally homologous modified protein to which is 

10 attached an hydroxyl radical cleaving function; the 
altered mispair recognition protein may comprise only a 
segment of "the native molecule containing the mispair 
recognition domain; the hydroxyl radial cleaving function 
is selected from the group consisting of the altered 

15 mispair recognition protein wherein the hydroxyl radical 
cleaving function is selected from the group consisting of 
the 1, 10-phenanthroline-copper complex, the EDTA iron 
complex, and the copper binding domain of serum albumin; 
the altered mispair recognition protein is the product of 

20 the mutS gene of E. coli or of another functionally 
homologous protein to which is attached attachment a DNA 
endonuclease activity capable of cleaving double- stranded 
DNA; the endonuclease activity is provided by the DNA 
cleavage domain of Fokl endonuclease. 

25 By "altered mispair recognition protein" is meant 

a mispair recognition protein that not only recognizes and 
binds to a base pair mismatch, but possess the ability to 
modify a strand of a DNA molecule containing such a 
mismatch. 

30 Several methods for attaching an hydroxyl radical 

cleaving function to a DNA binding protein are known., in 
the art. For example, lysyl residues may be modified by 
chemically attaching the 1, 10 -phenanthroline- copper 
complex to lysine residues, resulting in conversion of a 

35 DNA binding protein into a highly efficient site-specific 
nuclease that cleaved both DNA strands (in the presence of 
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hydrogen peroxide as a coreactant) within the 20 base pair 
binding site of the protein, as determined by DNase I 
footprinting (C.-H. Chen and D. S. Sigman, 1987, Science . 
237, 1197) . Chemical attachment of an EDTA-iron complex 
5 to the amino terminus of another DNA binding protein 
similarly produced a sequence specific DNA cleaving 
protein that cut both strands of the target DNA within a 
few bases of recognition site of similar size (J. P. 
Sluka, et al., 1987, Science , 235, 777). 

10 An alternate means for attaching the hydroxyl 

radical cleaving function to this same protein involved 
extension of the amino terminus with the three amino 
acids, Gly-Gly-His, which is consensus sequence for the 
copper-binding domain of serum albumin (D. P. Hack et al., 

15 1988, J. Am. Chem. Soc . 110, 7572-7574). This approach 
allows for preparation of such an artificial DNA cleaving 
protein directly by recombinant methods, or by direct 
synthesis using standard solid phase methods, when the 
peptide is sufficiently short as it was in this case (55 

2 0 residues including the 3 added amino acids) , thereby 
avoiding the need for an additional chemical modification 
step of the reagent which is both time consuming and 
difficult in large scale production. In contrast to the 
EDTA-iron complex, the particular peptide sequence 

2 5 constructed in this instance cleaved only one example out 
of four recognition sites in different sequence 
environments . 



engineering would appreciate that this general approach 
3 0 for converting a DNA binding protein into a DNA cleaving 
protein by attachment of an hydrogen radical cleavage 
function is widely applicable. Hence, DNA base mispair 
recognition proteins which normally only bind to DNA are 
modified to cleave DNA by attachment of an hydroxyl 
3 5 radical cleavage function, according to the practice of 
this aspect of this invention, without undue 



Nevertheless, one skilled in the art of protein 
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experimentation, by adjustment of appropriate variables 
taught in the art, particularly the chemical nature and 
length of the "spacer'- between the protein and the metal 
binding site . 

5 Additional altered forms of mispair recognition 

proteins that modify at least one strand of the DNA in a 
DNA: protein complex in the vicinity of the bound protein 
according to the present invention include proteins 
comprising the portions or "domains- of the unmodified 
10 base mispair recognition enzymes that are essential for 
binding to a DNA mispair. These essential DNA binding 
domains further comprise peptide sequences that are most 
highly conserved during evolution; such conserved domains 
are evident, for example, in comparisons of the sequences 
15 of the E. coli MutS protein with functionally homologous 
proteins in £. typhimurium and other structurally similar 
proteins. Accordingly, peptide sequences of a DNA base 
mispair recognition protein that are protected from 
proteases by formation of specific complexes with mispairs 
20 in DNA and, in addition or in the alternative, are 
evolutionarily conserved, form the basis for a 
particularly preferred embodiment of this aspect of the 
present invention, since such peptides constitute less 
than half the mass of the intact protein and, therefore, 
25 are advantageous for production and, if necessary, for 
chemical modification to attach a cleavage function for 
conversion of the DNA binding protein into a DNA cleavage 
protein specific for sites of DNA base mispairs. 

The DNA cleavage domain of Fokl endonuclease has 
30 been defined (Li et al, 1992. Proc. Natl. Acsri ^ 
U-S.A. ■ 89:4275) . 

Another embodiment of this aspect of the 
invention consists of a method for detecting and 
localizing a base pair mismatch within a DNA duplex, 
35 including the steps of contacting at least one strand of 
the first DNA molecule with the complementary strand of 
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the second DNA molecule under conditions such that base 
pairing occurs; contacting resultant duplex DNA molecules 
with an altered mispair recognition protein, under 
conditions such that the protein forms specific complexes 
5 with a mispair and thereby directs modification of at 
least one strand of the DNA in the resulting DNA protein 
complexes in the vicinity of the DNA protein complex, and 
determining the location of the modification of the DNA by 
a suitable analytic method, 

10 In the detection and localization of a base pair 

mismatch method according to this embodiment which employs 
an altered mispair recognition protein, and the 
modification comprises double- stranded cleavage of the DNA 
within the vicinity of any base mispair wherein the 

15 "vicinity" substantially corresponds to the sequence of 
DNA protected by the binding of the protein to a base 
mispair, generally within about 20 base pairs, A single- 
strand specific nuclease, SI, for instance, may be used to 
augment cleavage by the modified base mispair recognition 

2 0 protein in the event that a single -strand bias is 

suspected in the cleavage of any DNAs with which the 
protein forms a specific complex. Alternatively, DNA' s 
subject to cleavage by the modified mispair recognition 
protein may be analyzed by electrophoresis under 
25 denaturing conditions. Location of the modification is by 
suitable analytical methods known to those skilled in the 
art . 

Methods utilizing mismatch repair systems to detect A-G 
base pair mismatches 

3 0 In a preferred embodiment, a method for detecting 

and localizing A-G mispairs in a DNA duplex, includes the 
steps of contacting at least one strand of the first DNA 
molecule with the complementary strand of the second DNA 
molecule under conditions such that base pairing occurs; 
3 5 contacting resultant duplex DNA molecules with a mispair 
recognition protein that recognizes A-G mispairs and an 
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apurinic endonuclease or lyase under conditions such that 
in the presence of a mismatch an endonucleolytic incision 
is introduced in the duplex molecule, and determining the 
location of the incision by a suitable analytic method. 
5 in preferred embodiments the A-G mispair 

recognition protein is the product of the mutY gene of E. 
coli; and the analytical method includes gel 
electrophoresis - 

The present invention also comprises DNA mispair 

10 recognition protein that recognizes primarily A-G mispairs 
without any apparent requirement for hemimethylation. One 
example of this protein is the product of the mutY gene of 
E. coli, is a glycosylase which specifically removes the 
adenine from an A-G mispair in a DNA duplex. The MutY 

15 protein has been purified to near homogeneity by virtue of 
its ability to restore A-G to OG mismatch correction to 
cell-free extracts (K. G. Au et al., Proc. Nat . Acad. Sci. 
U.S.A. , 85, 9163, 1988) of a mutS mutY double mutant 
strain of E. coli , as described in Example 2, below. Its' 

2 0 electrophoretic migration in the presence of dodecyl 

sulfate in consistent with a molecular weight of 36kDa, 
and it apparently exists as a monomer in solution. MutY, 
an apurinic (AP) endonuclease, DNA polymerase I, and DNA 
ligase are sufficient to reconstitute MutY- dependent , A-G 

25 to OG repair in vitro . A DNA strand that has been 
depurinated thusly by the MutY protein is susceptible to 
cleavage by any of several types of AP endonuclease or 
lyase ( e.g. , human AP endonuclease II) or by piperidine, 
under conditions that are well known in the art . The 

30 cleavage products are then analyzed by gel electrophoresis 
under denaturing conditions. Accordingly, this MutY 
protein is useful in a method for the specific detection 
and localization of A-G mispairs, according to the 
practice of the present invention, and hence 

3 5 identification of A«T to C«G or G«C to T«A mutations. 

Sources of DNA fragments to be analyzed 
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In another embodiment of the invention, DNA 
molecules are obtained from the following sources: 
different individuals of the same species, individuals of 
different species, individuals of different kingdoms, 
5 different tissue types, the same tissue type in different 
states of growth, different cell types, cells of the same 
type in different states of growth, and cells of the same 
origin in different stages of development, and cells of 
the same type that may have undergone differential somatic 
10 mutagenesis, e.g. , one class of which may harbor per- 
cancerous mutation (s). 

In a preferred embodiment, the DNA molecules 
comprise a probe sequence that has been at least partially 
characterized . 

15 By "probe sequence that has been at least 

partially characterized 11 is meant a DNA molecule from any 
source that has been characterized by restriction mapping 
or sequence analysis, such techniques are known . to those 
skilled in the art. 

20 Kits compris ing a mispair recognition protein 

Another aspect of the invention features assay 
kits designed to provide components to practice the 
methods of the invention. 

In one aspect the invention features an assay kit 

25 for detecting a base pair mismatch in a DNA duplex. The 
kit comprises one or more of the following components: 
an aliquot of a mispair recognition protein, an aliquot of 
control oligonucleotides, and an exonuclease. 

In a preferred embodiment the mispair recognition 

3 0 protein is the product of the mutS gene of E. coli . 

By "control oligonucleotides" is meant 
oligonucleotides for assaying the binding of the mismatch 
repair protein to a base pair mismatch. One set of 
oligonucleotides are perfectly homologous (negative 

35 control) and thus are not bound by the mispair recognition 
protein. Another set of oligonucleotides containing a 
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base pair mismatch (positive control) and thus are bound 
by the mispair recognition protein. 

By "exonuclease" is meant enzymes possessing 
double-strand specific exonuclease activity, e.g. . E. coli 
5 exonuclease III, RecBCD exonuclease, lambda exonuclease, 
and T7 gene 6 exonuclease. 

Another aspect of the invention features an assay 
kit for detecting and localizing a base pair mismatch in 
a DNA duplex . The kit comprises one or more of the 

10 following components: an aliquot of all or part of a 
mismatch repair system, an aliquot of dideoxynucleoside 
triphosphates; and a single-strand specific endonuclease. 

By "all or part of a mismatch repair system" is 
meant either the complete system which is capable of 

15 repairing a base pair mismatch, for example, the three E. 
coll proteins MutH, MutL, and MutS, DNA helicase II, 
single-strand binding protein, DNA polymerase III, 
exonuclease I, exonuclease VII or RecJ exonuclease, DNA 
ligase and ATP, or only the three proteins MutH, MutL, and 

20 MutS, along with ATP such that an endonucleolytic incision 
is made at a GATC site, with no subsequent repair reaction 
taking place. 

In preferred embodiments the mismatch repair 
system includes: the products of the E. coli mutH, mutL, 

25 and mutS genes, or species variations thereof, DNA 
helicase II, single-strand DNA binding protein, DNA 
polymerase III holoenzyme, exonuclease I, exonuclease VII 
or RecJ exonuclease, DNA ligase, and ATP, the mismatch 
repair system includes only the products of the E . coli 

30 mutH, mutL, and mutS genes, or species variations thereof, 
and ATP. 

Another embodiment of the invention feature an 
assay kit for detecting and localizing a base pair 
mismatch in a DNA duplex comprising an aliquot of a 
3 5 modified mispair recognition protein. 
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In a preferred embodiment the mispair recognition 
protein is the product of the mutS gene of E. coli . 

A further embodiment of this aspect of the 
invention features an assay kit for detecting and 
5 localizing an A-G mispair within a DNA duplex. The kit 
comprises one or more of the following components: an 
aliquot of an A-G mispair recognition protein; and an 
aliquot of an apurinic endonuclease or lyase. 

In a preferred embodiment the A-G mispair 
10 recognition protein is the product of the MutY gene of E. 
coli . 

Methods utilizin g mismatch repair systems and recombinase 
proteins 

In a further aspect, the invention features a 
15 method for eliminating DNA molecules containing one or 
more mismatches from a population of heterohybrid duplex 
DNA molecules formed by base pairing of single- stranded 
DNA molecules obtained from a first source and a second 
source. The method includes digesting genomic DNA from 

2 0 the first and the second source with a restriction endonu- 

clease, methylating the DNA of one of the sources, dena- 
turing the DNA from one or both sources, mixing the DNA 
molecules from the first and the second source in the 
presence of a recombinase protein, proteins of a mismatch 
25 repair system that modulate the recombinase protein, 
single -strand binding protein, and ATP under conditions 
such that DNA duplexes form in homologous regions of the 
DNA molecules from the first and the. second source and the 
presence of a base pair mismatch results in regions that 

3 0 remain single-stranded, and removing molecules that 

contain single-stranded regions from the population. 

By "heterohybrid" is meant a duplex DNA molecule 
that consists of base-paired strands originating from two 
different sources, such that one strand of the duplex is 
3 5 from one source (first source) and the other strand is 
from another source (second source) . 
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The "source" of DNA molecules designates the 
origin of the genomic DNA used in the method. The first 
and second sources are different, i.e., not from the same 
cell of the same individual. 
5 By "restriction endonuclease" is meant an enzyme 

which recognizes specific sequences in double -stranded DNA 
and introduces breaks the phosphodiester backbone of both 
strands. For use in the current invention restriction 
endonucleases that digest genomic DNA or cDNA into 
10 fragments of approximately 4 to 20 kilobases are pre- 
ferred. 

By "methylating" is meant the process by which a 
methyl groups is attached to the adenine residue of the 
sequence "GATC" . This reaction is carried by enzymes well 

15 known in the art, such as the DAM system of E. coli . 

By "denaturing" is meant the process by which 
strands of duplex DNA molecules are no longer based paired 
by hydrogen bonding and are separated into single- stranded 
molecules. Methods of denaturation are well known to 

20 those skilled in the art and include thermal denaturation 
and alkaline denaturation. 

By "recombinase protein" is meant a protein that 
catalyzes the formation of DNA duplex molecules. Such a 
molecule is capable of catalyzing the formation of duplex 

25 DNA molecules from complimentary single- stranded molecules 
by renaturation or by catalyzing a strand transfer reac- 
tion between a single -stranded molecule and a double- 
stranded molecule. Examples of such a protein are the 
RecA proteins of E. coli and jL_ typhimurium . 

3 0 By " proteins of a mismatch repair system that 

modulate the recombinase protein" are meant components of 
a system which recognizes and corrects base pairing errors 
in duplex DNA molecules and also influence the activity of 
a recombinase protein. For example, a mispair recognition 

3 5 protein, e.g., MutS, and a protein that interacts with the 
mismatch repair protein, e.g. , MutL, together inhibit 
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duplex formation catalyzed by the recombinase protein in 
the presence of a base pair mismatch. Such modulation of 
the recombinase protein results in single-stranded regions 
downstream of the base pair mismatch. 
5 In preferred embodiments, the recombinase protein 

is the E. coli RecA protein, the mismatch repair system is 
from E. coli and the components are the MutS and MutL 
proteins, the sources of DNA are different individuals of 
the same species, individuals of different species, 
10 individuals of different kingdoms, different tissue types, 
the same tissue type in different states of growth, 
different cell types, cells of the same type in different 
states of growth, cells of the same origin in different 
stages of development, and cells of the same origin that 
15 may have undergone differential somatic mutagenesis, the 
method of removing molecules containing single-stranded 
regions is by chromatography on benzoylated naphthoylated 
DEAE, the method of removing molecules containing single - 
stranded regions is by treatment with a single-strand 
20 specific nuclease. 

The MutS, MutL protein, along with single -strand 
binding protein and ATP are involved in modulation of the 
E. coli RecA protein in catalyzing heteroduplex formation. 

The method for removing molecules containing 
25 single- strands from double- stranded molecules by the use 
of chromatography with benzoylated naphthoylated DEAE is 
well know to those skilled in the art. 

By "single strand specific nuclease" is meant an 
enzyme that specifically degrades single -stranded regions 
3 0 of DNA molecules and do not degrade double stranded 
regions. Examples of such nucleases are: SI, mung bean, 
T7 gene 3 endonuclease and PI nuclease. 

In another aspect, the invention features a 
method for eliminating DNA molecules containing one or 
3 5 more mismatches from a population of heterohybrid duplex 
DNA molecules formed by a strand transfer reaction between 
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duplex DNA molecules obtained from a first source and 
denatured DNA molecules from a second source . The method 
includes digesting genomic DNA from the first and the 
second source with a restriction endonuclease , methylating 
5 the DNA of one of the sources, denaturing the DNA from the 
second source, mixing the DNA molecules from the first and 
the second source in the presence of a protein which 
catalyzes strand transfer reactions / proteins of a mis- 
match repair system that modulate the protein with strand 

10 transfer activity, single strand binding protein, and ATP 
under conditions such that DNA heteroduplexes form in 
homologous regions of the DNA molecules from the first and 
the second source by strand transfer reaction and the 
presence of a base pair mismatch results in regions that 

15 remain single -stranded, and removing molecules that 
contain the single-stranded regions from the population. 

By "strand transfer reaction is meant" a three 
strand reaction between duplex DNA from one source and 
single -stranded DNA from another source in which one 

20 strand of the duplex is displaced the by a single-stranded 
molecule . 

By "a protein which catalyzes strand transfer 
reaction" is meant proteins such as: RecA, homologs of 
RecA, and proteins with branch migration enhancing activi- 

25 ties such as RuvA, RuvB, RecG. 

In preferred embodiments, the strand transferase 
protein is the E . coli RecA protein, the mismatch repair 
system is from E. coli and the components are the MutS and 
MutL proteins, the sources are different individuals of 

3 0 the same species, individuals of different species, 
individuals of different kingdoms, different tissue types, 
the same tissue type in different states of growth, 
different cell types, cells of the same type in different 
states of growth, and cells of the same origin in differ- 

3 5 ent stages of development, cells of the same origin that 
may have undergone differential somatic mutagenesis ( e.g. , 
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normal as opposed to pre -tumor cells) , a probe sequence 
that has been at least partially characterized, the method 
of removing molecules containing single -stranded regions 
is by chromatography on benzoylated naphthoylated DEAE, 
5 the method of removing molecules containing single - 
stranded regions is by treatment with a single strand 
specific nuclease. 

Methods of improving the Genomic Mismatch Scanning 
technique 

10 In another aspect the invention features the 

utilization of a recombinase or strand transferase and 
proteins of a mismatch repair system that modulate the 
recombinase or strand transferase, in the hybridization 
step of the genomic mismatch scanning technique. 

15 Formation of duplex molecules catalyzed by a recombinase 
or strand transferase protein which is modulated by 
components of a mismatch repair system, provide an addi- 
tional selection step in the GMS method. 

By "genomic mismatch scanning" is meant a tech- 

2 0 nique to identify regions of genetic identity between two 

related individuals. Such a technique has been described 
by Nelson et al, 4 Nature Genetics 11, 1993, 

In a further embodiment the invention features a 
method of genomic mismatch scanning such that 

25 heterohybrid DNA molecules containing a base pair mismatch 
are removed, without the use of exonuclease III. The 
method comprises the steps of contacting a population of 
heterohybrid DNA molecules potentially containing base 
pair mismatches with all the components of a DNA mismatch 

30 repair system in the absence of dNTP's or in the presence 
of one or more dideoxy nucleoside triphosphates under 
conditions such that single -stranded gaps are generated in 
DNA fragments that contained a base pair mismatch and 
removing the molecules containing single-stranded gaps. 

3 5 in preferred embodiments the DNA mismatch repair 

system is the E . coli methyl -directed mismatch repair 
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system; removal of molecules containing single- stranded 
regions is by chromatography on benzoylated naphthoylated 
DEAE; removal of molecules containing single -stranded 
regions is by treatment with a single- strand specific 
5 nuclease. 

In a further embodiment, the invention features 
another variation of the method of genomic mismatch 
scanning such that heterohybrid DNA molecules containing 
base pair mismatches are removed, without the use of 

10 exonuclease III. The method comprises the steps of 
contacting a population of heterohybrid DNA molecules 
potentially containing base pair mismatches with all the 
components . of a DNA mismatch repair system and 
biotinylated nucleoside triphosphates under conditions 

15 such that biotinylated nucleotides are incorporated into 
DNA fragments that contained a base pair mismatch and, 
removing the molecules containing biotinylated molecules 
by binding to avidin. 

Substitution with biotinylated nucleotides and 

20 binding of molecules that have incorporated these 
nucleotides are procedures well known to those skilled in 
the art . This procedure allows fractionation of a 
population of hybrid DNA molecules into two fractions: 
(i) A mismatch free fraction which fails to adhere to 

2 5 avidin; and (ii) A population that originally contained 

mispairs and which binds to avidin. The former can be 
utilized in the GMS procedure. The latter, avidin-bound 
class can be employed for other purposes. For example, 
when prepared using heterohybrid DNA produced by annealing 

3 0 DNA from two related haploid organisms the biotinylated 

sequences correspond to those DNA regions that vary 
genetically between the two organisms. Such sequences can 
thus be applied to determination of the molecular basis of 
genetic variation of organisms in question, e.g. . 
3 5 pathogenic versus nonpathogenic microbial subspecies. 
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In a preferred embodiment the mismatch repair 
system is the methyl -directed mismatch repair system of E. 
coli . 

In a further embodiment, the invention features 
5 a method of genomic mismatch scanning such that 

duplex DNA molecules are subject to exonuclease III 
digestion only after ligation into monomer circles. 

By "ligation into monomer circles" is meant 
ligation of molecules under conditions of dilute 
10 concentration such that ends of the same molecule become 
ligated. Such a procedure is known to those skilled in 
the art. In these methods it is advantageous sometimes to 
separate molecules having mismatches from those which do 
not. By use of appropriate separation procedures both 
15 such populations of molecules can be selected. 

Methods applying mismatch repair stems to populations of 
amplified molecules 

In another aspect, the invention features a 
method for correcting base pair mismatches in a population 

2 0 of DNA duplexes that have been produced by enzymatic 

amplification potentially containing one or more base pair 
mismatches. The method includes contacting the population 
of DNA duplexes with a DNA methylase and a mismatch repair 
system such that base pair mismatches are corrected. 
25 By "enzymatic amplification" is meant a reaction 

by which DNA molecules are amplified. Examples of such 
reactions include the polymerase chain reaction and 
reactions utilizing reverse transcription and subsequent 
DNA amplification of one or more expressed RNA sequences. 

3 0 By "mismatch repair system" is meant a complete 

system such that base pair mismatches are detected and 
corrected. 

In a preferred embodiment, the mismatch repair 
system is the methyl -directed mismatch repair system of E. 
3 5 coli . Components of the defined system capable of cor- 
recting mismatches include MutH, MutL, and MutS proteins, 
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DNA helicase II, single-strand binding protein, DNA 
polymerase III holoenzyme, exonuclease I, exonuclease VII 
or RecJ, DNA ligase, ATP and four deoxynucleoside 
triphosphates . 

5 In a further aspect, the invention features a 

method for removing DNA molecules containing one or more 
base pair mismatches in a population of molecules that 
have been produced by enzymatic amplification potentially 
containing one or more base pair mismatches. The method 
10 includes contacting a population of enzymatically ampli- 
fied molecules with components of a mismatch repair system 
under conditions such that one or more components of the 
repair system form a specific complex with a base pair 
mismatch contained in a DNA duplex and removing DNA 
15 duplexes containing the complex from the population of 
duplex molecules* 

By "complex" is meant the result of specific 
binding of at least one component of mismatch repair 
system to a base pair mismatch. 
20 In a preferred embodiment, the mismatch repair 

system is the E . coli methyl -directed mismatch repair 
system, the component of the system is the MutS protein, 
the MutS protein is affixed to a solid support and removal 
of the DNA duplex containing the complex is by binding to 
25 this support . 

Methods of attachment of proteins to solid 
support systems and use of those systems to perform 
chromatography so as to remove specific molecules are well 
known to those skilled in the art. 
30 In another embodiment, the invention features a 

method for removing DNA molecules containing one or more 
base pair mismatches in a population of DNA duplexes that 
have been produced by enzymatic amplification, potentially 
containing one or more base pair mismatches. The method 
35 comprises the steps of contacting the population of DNA 
duplexes with components of a mismatch repair system under 
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conditions such that an endonucleolytic incision is made 
on a newly synthesized . strand of a DNA duplex molecule 
containing a base pair mismatch so that such a molecule 
cannot produce a full -sized product in a subsequent round 
5 of enzymatic amplification. 

By "endonucleolytic cleavage 11 is meant cleavage 
on the unmethylated strand at a hemimethylate of GATC 
sequence by components of a mismatch repair system. 

By "full sized product" is meant a molecule that 

10 includes the entire region of interest that is subject to 
amplification. Molecules that contain endonucleolytic 
cleavage cannot be amplified in subsequent rounds to 
produce full sized product and thus will be eliminated 
from the final amplified product population. 

15 In a preferred embodiment the mismatch repair 

system is the methyl -directed mismatch repair system of E . 
coli and the components are Muts, MutL, and MutH proteins, 
and ATP. 

Methods to remove from a population molecules containing 

20 a base pair mismatch 

In a further embodiment the invention features a 
method for removing DNA duplex' molecules containing base 
pair mismatches in a population of heteroduplex DNA 
molecules produced from different sources. The method 

25 comprises contacting the population of DNA duplex mole- 
cules potentially containing base pair mismatches with 
some or all components of a mismatch repair system under 
conditions such that the component or components form a 
complex with the DNA having a base pair mismatch, and not 

3 0 with a DNA duplex lacking a base pair mismatch, and 
removing DNA molecules containing the complex or the 
product of the complex. 

By "product of the complex" is meant a DNA duplex 
that has incorporated biotinylated nucleotides. 

35 By "some or all components of a mismatch repair 

system" is meant either a complete mismatch repair system 
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such that the complete reaction is carried out or only the 
proteins of the system which specifically bind to the 
mismatch. 

In preferred embodiments the mismatch repair 
5 system is the methyl -directed mismatch repair system of E. 
coli ; some or all protein of the mismatch repair system 
have been affixed to a solid support and removal by 
adsortion; the complex interacts with other cellular pro- 
teins, and removal of the complex occurs through the 
10 interaction; and the conditions include the use of 
biotinylated nucleotides such that the nucleotides are 
incorporated into duplex molecules that contained a base 
pair mismatch and such duplexes are removed by binding to 
avidin. 

15 By "some or all proteins" is meant, for example, 

E. coli proteins MutS, MutL, and MutH. 

By "attached to a solid support" is meant a 
means, such as by fusion with glutathione transferase, by 
which a protein is attached to a solid support system and 

20 still remains functional. 

By "adsortion" is meant specific binding to some 
or all of the proteins of the mismatch repair system 
affixed to a solid support so that separation from other 
molecules that do not bind to the solid support affixed 

2 5 proteins occurs. 

By "interacts with other cellular proteins" is 
meant interaction between mismatch repair system protein 
or between those proteins and other proteins. For 
example, the interaction of MutS bound to a duplex DNA 

3 0 containing a mismatch with MutL or RecA. 

Kits containing a mismatch repair system 

In a preferred embodiment, a kit for correcting 
base pair matches in duplex DNA molecules including one or 
more of the following components comprising the following 
35 purified components: an aliquot of E. coli MutH, MutL, 
and MutS proteins or species variations thereof, an 
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aliquot of DNA helicase II, an aliquot of single-strand 
DNA binding protein / an aliquot of DNA polymerase III 
holoenzyme, an aliquot of exonuclease I, an aliquot of Exo 
VII or RecJ, an aliquot of DNA ligase, an aliquot of ATP, 
5 and an aliquot of four deoxynucleoside triphosphates. 

A further embodiment of this aspect of this 
invention includes an assay kit for eliminating DNA 
molecules containing one or more base pairing mismatches 
from a population of heterohybrid duplex molecules formed 
10 by base pairing of single -stranded DNA molecules obtained 
from a first and a second source comprising one or more of 
the following components, an aliquot of proteins of a 
mismatch repair system, and an aliquot of a recombinase 
protein. 

15 By "proteins of a mismatch repair system" are 

meant proteins that modulate the activity of a recombinase 
protein. 

In a preferred embodiment, the proteins of the 
mismatch correction system are the MutS and MutL proteins 
20 of E. coli . 

Another aspect of the invention features an assay 
kit for removing DNA molecules containing one or more base 
pair mismatches comprising an aliquot of one or more 
proteins of a mismatch repair system that have been 
25 affixed to a column support. 

In a preferred embodiment, the protein of the 
mismatch repair system is the MutS protein of E. coli. 

Another aspect of the invention features a kit 
for fractionating a heteroduplex DNA population into two 
3 0 pools, one of which was mismatch- free at the beginning of 
the procedure, the second of which represents duplexes 
that contained mispaired bases at the beginning of the 
procedure. This kit is comprised of one or more of the 
following components: an aliquot of all components of 
35 complete mismatch repair system; an aliquot of 



BNSDOCID: <WO 95126B8A1J_> 



WO 95/12688 



PCT/US94/12501 



10 



20 



25 



53 

biotinylated nucleotides; and an aliquot of avidin or an 
avidin-based support . 

In a preferred embodiment, the mismatch repair 
system is from E. coli and consists of products of the 
mutH, mutL, and mutS genes, DNA helicase II, single-strand 
DNA binding protein, DNA polymerase III holoenzyme, 
exonuclease I, exonuclease VII or RecJ exonuclease, DNA 
ligase, and ATP. 

The following Examples are provided for further 
illustrating various aspects and embodiments of the 
present invention and are in no way intended to be 
limiting of the scope. 

Example 1. DNA Mismatch Cor rection in a Defined SvstPm 
In order to address the biochemistry of methyl - 
15 directed mismatch correction, the reaction has been 
assayed in vitro using the type of substrate illustrated 
in Fig. l. Application of this method to cell-free 
extracts of E. coli (A.L. Lu, S. Clark, P. Modrich, Proc. 
Natl. Acad, sci. USA 80, 4639, 1983) confirmed in vivo 
findings that methyl -directed repair requires the products 
of four mutator genes, mutH, mutL, mutS and uvrD (also 
called mutU) , and also demonstrated a requirement for the 
E. coli single-strand DNA binding protein (SSB) . The 
dependence of in vjtro correction on mutH, mutL and mutS 
gene products has permitted isolation of these proteins in 
near homogeneous, biologically active forms. The MutS 
protein binds to mismatched DNA base pairs; the MutL 
protein binds to the MutS-heteroduplex complex (M. 
Grilley, K. M. Welsh, S.-S. Su, P. Modrich, J. Biol, rh^m 
30 264, 1000, 1989); and the 25-kD MutH protein possesses a 
latent endonuclease that incises the unmethylated strand 
of a hemimethylated d(GATC) site (K. M. Welsh, A.-L. Lu, 
S. Clark, P. Modrich, J. Biol, chem. 262, 15624, 1987), 
with activation of this activity depending on interaction 
3 5 of MutS and MutL with a heteroduplex in the presence of 
ATP (P. Modrich, J. Biol. Chem. 264, 6597, 1989). 
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However, these three Mut proteins together with SSB and 
the DNA helicase II product of the uvrD (mutU) gene (i D 
Hxckson, H . M. Arthur, D . Bramhill, P. T . Emmerson, Mol ' 

5 f !" TT- 190, 265 ' 1983) are not -»«^i«t to med ~: 

methyl- dl rected repair. Below is described identification 
of the remaining required components and reconstitution of 
the reaction in a defined system. 

Protein and cofactor requirements for mismatch 
correction. Methyl -directed mismatch correction occurs by 
an excision repair reaction in which as much as several 
k.lobases of the unmethylated DNA strand is excised and 
resynthe si2 ed (A.-L. Lin, K. Welsh, S. Clark, S.-S Su P 
M ° driCh ' Cold Spring Harbor ^ ^ nt _ _ Ri n 1 49 ^89 
1984). DNA polymerase I , an enzyme that functions in a 
15 number of DNA repair pathways, does not contribute in a 
! aDOr ^ t0 rae thyl -directed correction since extracts 
from a polA deletion strain exhibit normal levels of 
actxvaty. However extracts derived from a dnaZ" strain 
are temperature sensitive for methyl -directed repair in 
vitro (Table l) . * 

Table i. Requirement for t and g Subunits of DNA 
Polymerase III Holoenzyme in Mismatch Repair 



(fmol/h/mg) <42°/34°) 




Extracts from strains AX727 (lac thi str" dnaZ20-i 6 ) and 
AX729 (as AX727 except purE dnaZ*) were prepared as 
described (A.-L. Lin, S. Clark, P. Modrich, Proc Natl 
30 Apad. SC S, TT^ 80, 4639, 1983). Samples (1 10 ' , g of 
protean) were mixed with 0.8 fil of l M KC1 and water to 
yield a volume of 7.2 M l , and preincubated at 42° or 340 
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C for 2.5 minutes. All heated samples were then placed at 
34° C and supplemented with 2.2 //l of a solution 
containing 0.1 /xg (24 fmol) of hemimethylated G-T 
heteroduplex DNA, 16 ng of MutL protein, 50 ng of MutS 
5 protein, and buffer and nucleotide components of the 
mismatch correction assay (A.-L. Lu, S. Clark, P. Modrich, 
Proc . Natl . Acad . Sci . USA 80, 4639, 1983) . DNA 
polymerase III holoenzyme (57 ng in 0.6 fil) or enzyme 
buffer was then added, and incubation at 34° C was 

10 continued for 60 min. Heated extracts were supplemented 
with purified MutL and MutS proteins because these 
components are labile at 42° C. Activity measurements 
reflect the correction of heteroduplex sites. 

The dnaZ gene encodes the r and y subunits of 

15 DNA polymerase III holoenzyme (M. Kodaira, S. B. Biswas, 
A. Romberg, Mol . Gen. Genet. 192, 80, 1983; D. A. Mullin, 
C. L. Woldringh, J. M. Henson, J. R. Walker, Mol. Gen. 
Genet . 192, 73, 1983), and mismatch correction activity is 
largely restored to heated extracts of the temperature - 

20 sensitive mutant strain by addition of purified polymerase 
III holoenzyme. Since DNA polymerase III holoenzyme is 
highly processive, incorporating thousands of nucleotides 
per DNA binding event, the involvement of this activity is 
consistent with the large repair tracts associated with 

25 the methyl -directed reaction. 

Additional data indicate that purified MutH, 
MutL, and MutS proteins, DNA helicase II, SSB, and DNA 
polymerase III holoenzyme support methyl -directed mismatch 
correction, but this reaction is inhibited by DNA ligase, 

3 0 an enzyme that is shown below to be required to restore 
covalent continuity to the repaired strand. This 
observation led to isolation of a 55 -kD stimulatory 
protein that obviates ligase inhibition. The molecular 
weight and N-terminal sequence of this protein indicated 

35 identity to exonuclease I (G . J. Phillips and s. R. 
Kushner, J. Biol. Chem, 262, 455, 1987), and homogeneous 
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exonuclease I readily substitutes for the 55-kD 
stimulatory activity (Table 2). Thus, exonuclease I and 
the six activities mentioned above mediate efficient 
methyl -directed mismatch correction in the presence of 
ligase to yield product molecules in which both DNA 
strands are covalently continuous . 



Table 2. Stimulation of 


in vitro Methyl -nirpn-^ 


Correction by Exonuclease I. 




Protein added 


Mismatch correction 




(fmol/20 min) 


None 


1 


55-kD protein 


18 


Exonuclease I 


18 



Reactions (10 /zl) contained 0.05 M HEPES (potassium salt, 
pH 8.0) , 0.02 M KC1, 6 mM MgCl 2 / bovine serum albumin (0.05 

20 mg/ml) , l mM dithiothreitol, 2 mM ATP, 100 /xM (each) dATP, 
dCTP, dGTP, and dTTP, 25 fiM /3-NAD*, 0.1 fig of 
hemimethylated, covalently closed G-T heteroduplex DNA 
(Fig. l, methylation on c strand, 24 fmol) , 0.26 ng of 
MutH (K. M. Welsh, A.-L. Lin, S. Clark, P. Modrich, 

25 Biol. Chem. 262, 15624, 1987), 17 ng of MutL (M. Grilley, 
K. R. Welsh, S. -S. Su, P. Modrich, J. Biol. Chem. 264, 
1000, 1989), 35 ng of MutS (S.-S. Sin and P. Modrich, 
Proc. Nat'l Acad. Sci. USA 83, 5057, 1986), 200 ng of SSB 
(T. R. Lohman, J. R. Green, R. S. Beyer, Biochem-i atry 25, 

30 21, 1986; U. S. Biochemical Corp.), 10 ng of DNA helicase 
II (K. Kumura and M. Sekiguchi, <J. Biol. Chem. 259, 1560, 
1984), 20 mg of E. coli DNA ligase (U.S. Biochemical 
Corp.), 95 ng of DNA polymerase III holoenzyme (c. McHenry 
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and A. Romberg, J. Biol. Chem. 252, 6478, 1977) , and 1 ng 
of 55-kD protein or exonuclease I (U.S. Biochemical Corp.) 
as indicated. Reactions were incubated at 37° C for 2 0 
minutes, quenched at 55° C for 10 minutes, chilled on ice, 
5 and then digested with Xho 1 or Hind III endonuclease to 
monitor correction. Repair of the G-T mismatch yielded a 
only the G-C containing, Xho I-sensitive product. 

The requirements for repair of a covalently 
closed G-T heteroduplex (Fig. 1) are summarized in- Table 
10 3 (Closed circular) . No detectable repair was observed in 
the absence of MutH, MutL, or MutS proteins or in the 
absence of DNA polymerase III holoenzyme, and omission of 
SSB or exonuclease I reduced activity by 85 to 90 percent. 

Table 3 . Protein and Cof actor Requirements for Mismatch 
15 Correction in a Defined System. 



Reaction conditions Mismatch correction 

(fmol/20 min) 



2 0 Closed Circular Open Circular 

Heteroduplex Heteroduplex 



Complete 15 17 (no MutH, 

No ligase) 

minus MutH <1 

minus MutL <1 <l 

minus MutS <1 <I 
minus DNA polymerase 

III holoenzyme <1 <1 

minus SSB 2 1.4 
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minus exonuclease I 
minus DNA helicase II 
minus helicase II, 

plus immune serum 
minus helicase II , 

plus pre -immune serum 
minus Ligase/NAD + 
minus MgCi 2 
minus ATP 
minus dNTP's 



Reactions utilizing covalently closed G-T heteroduplex 
(modification on c strand) were performed as described in 
the legend to Table 2 except that 1.8 ng of exonuclease I 
was used. Repair of open circular DNA was performed in a 
5 similar manner except that RutH, DNA ligase, and /J -NAD* 
were omitted from all reactions, and the hemimethylated G- 
T heteroduplex (modification on c strand) had been incised 
with MutH protein as described in the legend to Fig. 4. 
When present, rabbit antiserum to helicase II or pre- 

10 immune serum (5 ^g protein) was incubated at 0° C for 2 0 
minutes with reaction mixtures lacking MgCl 2 ; the cof actor 
was then added and the assay was performed as above . 
Although not shown, antiserum inhibition was reversed by 
the subsequent addition of more helicase II . With the 

15 exception of the DNA polymerase III preparation, which 
contained about 15% by weight DNA helicase II (text)l the 
purity of individual protein fractions was >. 95%. NT 
not tested. 



2 0 These findings are in accord with previous 

conclusions concerning requirements of the methyl -directed 
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2 

16 

<1 

14 
14 
<1 
<1 
<1 



<1 
15 

<1 

NT 
NT 
NT 
NT 
NT 
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reaction. However, in contrast to observations in vivo 
and in crude extracts indicating a requirement for the 
uvrD product, the reconstituted reaction proceeded readily 
in the absence of the added DNA helicase II (Table 2) . 
5 Nevertheless, the reaction was abolished by antiserum to 
homogeneous helicase II, suggesting a requirement for this 
activity and that it might be present as a contaminant in 
one of the other proteins. Analysis of these preparations 
for their ability to restore mismatch repair to an extract 

10 derived from a uvrD (mutU) mutant and for the physical 
presence of helicase II by immunoblot assay revealed that 
the DNA polymerase III holoenzyme preparation contained 
sufficient helicase II (13 to 15 per cent of total protein 
by weight) to account for the levels of mismatch 

15 correction observed in the defined system. Similar 
results were obtained with holoenzyme preparations 
obtained from two other laboratories . The purified system 
therefore requires all the proteins that have been 
previously implicated in methyl -directed repair. 

2 0 The rate of correction of the closed circular 

heteroduplex was unaffected by omission of DNA ligase 
(Table 3) , but the presence of this activity results in 
production of a covalently closed product. Incubation of 
a hemimethylated, supercoiled G-T heteroduplex with all 
25 seven proteins required for correction in the presence of 
DNA ligase resulted in extensive formation of covalently 
closed, relaxed, circular molecules. Production of the 
relaxed DNA was dependent on MutS (Fig. 2) and MutL 
proteins, and the generation of this species was 

3 0 associated with heteroduplex repair (Fig. 2) . Correction 

also occurred in the absence of ligase, but in this case 
repair products were open circular molecules, the 
formation of which depended on the presence of MutS (Fig. 
2) , Since MutS has no known endonuclease activity but 
35 does recognize mispairs, it is inferred that open circular 
molecules are the immediate product of a mismatch- 
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provoked-excision repair process. Ligase closure of the 
strand break (s) present in this species would yield the 
covalently closed, relaxed circular product observed with 
the complete system . 
5 The set of purified activities identified here 

as being important in methyl -directed repair support 
efficient correction. In the experiments summarized in 
Table 3, the individual proteins were used at the 
concentrations estimated to be present in the standard 

10 crude extract assay for correction as calculated from 
known specific activity determinations. Under such 
conditions the rate and extent of mismatch repair in the 
purified system are essentially identical to those 
observed in cell-free extracts. 

15 DNA sites involved in repair by the purified 

system . The single d(GATC) sequence within the G-T 
heteroduplex shown in Fig. 1 is located 1024 base pairs 
from the mispair. Despite the distance separating these 
two sites , correction of the mismatch by the purified 

20 system responded to the state of modification of the 
d(GATC) sequence as well as its presence within the 
heteroduplex (Fig. 3) . A substrate bearing d(GATC) 
methylation on both DNA strands did not support mismatch 
repair nor did a related heteroduplex in which the d(GATC) 

25 sequence was replaced by d (GATT) . However, each of the 
two hemimethylated heteroduplexes were subject to strand- 
specific correction, with repair in each case being 
restricted to the unmodified DNA strand. With a 
heteroduplex in which neither strand was methylated, some 

3 0 molecules were corrected on one strand, and some were 
corrected on the other. As can be seen, the 

hemimethylated heteroduplex bearing methylation on the 
complementary DNA strand was a better substrate than the 
alternative configuration in which modification was on the 

35 viral strand, with a similar preference for repair of the 
viral strand being evident with the substrate that was 
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unmethylated on either strand. This set of responses of 
the purified system to the presence and state of 
modification of d(GATC) sites reproduce effects previously 
documented in vivo and in crude extract experiments (R. S. 
5 Lahue, S. -S. Su, P. Modrich, Proc . Natl. Acad. Sci . USA 
84, 1482, 1987) . 



Table 4 . Correction Efficiencies for Different 

Mismatches . 

c*v c*v* 





Heteroduplex 


Markers 


Rate 


Bias 


Rate 


Bias 


c 


5 ' - CTCGA 


G 


AGCTT 


Xho I 


1.2 


>18 


0.38 


>5 


V 


3 ' -GAGCT 


T 


TCGAA 


Hind III 










c 


5' OCTCGA 


G 


AGCTG 


Xho I 


1.1 


>17 


0.38 


>6 


V 


3 ' -GAGCT 


G 


TCGAC 


Pvu II 










c 


5 ' -ATCGA 


T 


AGCTT 


Cla I 


1.0 


>16 


0.24 


3 


V 


3 ' -TAGCT 


T 


TCGAA 


Hind III 










c 


5 ' -ATCGA 


A 


AGCTT 


Hind III 


0.88 


>20 


0 .20 


>7 


V 


3' -TAGCT 


A 


TCGAA 


Cla I 










c 


5 ' - CTCGA 


A 


AGCTT 


Hind III 


0 .61 


17 


0.28 


>5 


V 


3 ' -GAGCT 


c 


TCGAA 


Xho I 










c 


5' -GTCGA 


c 


AGCTT 


Sal I 


0.60 


12 


0.23 


>4 


V 


3 ' - CAGCT 


T 


TCGAA 


Hind III 










c 


5' -GTCGA 


A 


AGCTT 


Hind III 


0.44 


>13 


0.21 


5 


V 


3 ' -CAGCT 


T 


TCGAA 


Sal I 










c 


5 ' - CTCGA 


C 


AGCTG 


Pvu II 


0.04 


NS 


<0.04 


NS 


V 


3 ' -GAGCT 


C 


TCGAC 


Xho I 











Table 4. (Continued) Correction of the eight possible 
base-base mispairs was tested with the set of covalently 
closed heteroduplexes described previously including the 
G-T substrate shown in Fig. 1. "With the exception of the 
5 mispair and the variations shown at the fifth position on 
either side, all heteroduplexes were identical in 
sequence. Each DNA was tested in both hemimethylated 
configurations under complete reaction conditions (Table 



BNSDOCID: <WO 95126B8A1J_> 



SUBSTITUTE SHEET (RULE 2S) 



WO 95/12688 



62 



3, closed circular heteroduplex) except that samples were 
removed at 5 -minute intervals over a 20 minute period in 
order to obtain initial rates (fmol/min) . c and v refer 
to complementary and viral DNA strands, and Bias indicates 
5 the relative efficiency of mismatch repair occurring on 
the two DNA strands (ratio of unmethylated to methylated) 
as determined 60 minutes after the reaction was started. 
NS - not significant. With the exception of the C-C 
heteroduplexes, repair in the absence of MutS protein was 
10 less than 20% (in most cases < 10%) of that observed in 
its presence (not shown) . 



The efficiency of repair by the methyl -directed 
pathway depends not only on the nature of the mispair, but 
15 also on the sequence environment in which the mismatch is 
embedded (P. Modrich, Ann . Rev . B iochem . 56, 435 , 1987) . 
To assess the mismatch specificity of the purified system 
under conditions where sequence effects are minimized, a 
set of heteroduplexes were used in which the location and 

2 0 immediate sequence environment of each mispair are 

essentially identical (S. -S. Su, R. S. Lahue, K. G. Au, 
P. Modrich, J, Biol. Chem. -263, 6829, 1988) . This 
analysis (Table 4) showed that the purified system is able 
to recognize and repair in a methyl -directed manner seven 
25 of the eight possible base-base mismatches, with C-C being 
the only mispair that was not subject to significant 
correction. Table 3 also shows that the seven corrected 
mismatches were not repaired with equal efficiency and 
that in the case of each heteroduplex, the hemimethylated 

3 0 configuration modified on the complementary DNA strand was 

a better substrate than the other configuration in which 
the methyl group was on the viral strand. These findings 
are in good agreement with patterns of repair observed 
with this set of heteroduplexes in E. coli extracts 
3 5 (Although the patterns of substrate activity observed in 
extracts and in the purified system are qualitatively 
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identical, the magnitude of variation observed differs for 
the two systems. Hemimethylated heteroduplexes modified on 
the complementary DNA strand are better substrates in both 
systems, but in extracts such molecules are repaired at 
5 about twice the rate of molecules methylated on the viral 
strand. In the purified system these relative rates 
differ by factors of 2 to 4 . A similar effect may also 
exist with respect to mismatch preference within a given 
hemimethylated family. Although neither system repairs C- 

10 C, the rates of repair of other mismatches vary by a 
factors of 1.5 to 2 in extracts but by factors of 2 to 3 
in the defined system.). 

Strand- specif ic repair directed by a DNA strand 
break. Early experiments on methyl -directed repair in E. 

15 coli extracts led to the proposal that the strand- 
specificity of the reaction resulted from endonucleolytic 
incision of an unmethylated DNA strand at a d(GATC) 
sequence. This idea was supported by the finding that 
purified MutH protein has an associated, but extremely 

20 weak d (GATC) endonuclease that is activated in a mismatch- 
dependent manner in a reaction requiring MutL, MutS, and 
ATP. The purified system has ' been used to explore this 
effect more completely. 

The two hemimethylated forms of the G-T 

2 5 heteroduplex shown in Fig. 1 were incised using high 

concentrations of purified MutH protein to cleave the 
unmethylated DNA strand at the d (GATC) sequence 
(£>pGpApTpC) . After removal of the protein, these open 
circular heteroduplexes were tested as substrates for the 
30 purified system in the absence of DNA ligase. Both open 
circular species were corrected in a strand-specific 
manner and at rates similar to those for the corresponding 
covalently closed heteroduplexes (Fig. 4) . As observed 
with closed circular heteroduplexes, repair of the MutH- 

3 5 cleaved molecules required MutL, MutS, SSB, DNA polymerase 

III holoenzyme, and DNA helicase II (Fig. 4 and open 
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circle entries of Table 2) , but in contrast to the 
behavior of the closed circular substrates, repair of the 
mismatch within the open circular molecules occurred 
readily in the absence of MutH protein. Thus prior 
5 incision of the unmethylated strand of a d(GATC) site can 
bypass the requirement for MutH protein in strand- specific 
mismatch correction. 

The nature of the MutH -independent repair was 
examined further to assess the effect of ligase on the 

10 reaction and to determine whether a strand break at a 
sequence other than d (GATC) can direct correction in the 
absence of MutH protein (Fig. 5) . As mentioned above, a 
covalently closed G-T heteroduplex that lacks a d (GATC) 
sequence is not subject to repair by the purified system 

15 in the presence (Fig. 3) or absence of DNA ligase. 
However, the presence of one strand-specific, site- 
specific break is sufficient to render this heteroduplex 
a substrate for the purified system in the absence of 
ligase and RutH protein (Fig. 5) . Repair of this open 

2 0 circular heteroduplex was limited to the incised, 

complementary DNA strand, required presence of MutL and 
MutS proteins, DNA polymerase III, and SSB, and correction 
of the molecule was as efficient as that observed with the 
hemimethylated heteroduplex that had been cleaved by MutH 
25 at the d (GATC) sequence within the complementary strand. 
Although the presence of a strand break is sufficient to 
permit strand-specific correction of a heteroduplex in the 
absence of MutH and ligase, the presence of the latter 
activity inhibited repair not only on the heteroduplex 

3 0 lacking a d (GATC) sequence but also on both hemimethylated 

molecules that had been previously incised with MutH 
protein (Fig. 5) . This inhibition by ligase was 
circumvented by the presence of MutH protein, but only if 
the substrate contained a d(GATC) sequence, with this 
3 5 effect being demonstrable when both types of heteroduplex 
were present in the same reaction (Fig. 5, las t_ column) . 
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This finding proves that MutH protein recognizes d(GATC) 
sites and is consistent with the view that the function of 
this protein in mismatch correction is the incision of the 
unmethylated strand at this sequence . 
5 Example 2 : Purification of MutY Protein 

Purification of MutY Protein E. coli RK1517 was 
grown at 37° C in 170 liters of L broth containing 2.5 mM 
KH 2 P0 4/ 7.5 mM Na 2 HP0 4 (culture pH = 7,4) and 1% glucose. 
The culture was grown to an As 90 of 4, chilled to 10° C and 

10 cells were harvested by continuous flow cent rifugat ion. 
Cell paste was stored at 70° C, A summary of the MutY 
purification is presented in Table 1 . Fractionation 
procedures were performed at 0°-4° C, centrif ugation was 
at 13,000 x g, and glycerol concentrations are expressed 

15 as volume percent . 

Frozen cell paste (290 g) was thawed at 4° C, 
resuspended in 900 ml of 0.05 M Tris-HCl (pH 7.5), 0.1 M 
NaCl, 1 mM dithiothreitol , 0.1 mM EDTA, and cells were 
disrupted by sonication. After clarification by 

2 0 centrif ugation for 1 hr, the lysate (Fraction I, 970 ml) 

was treated with 185 ml of 25% streptomycin sulfate 
(wt/vol in 0.05 M Tris-HCl (pH 7.5), 0.1 M NaCl, 1 mM 
dithiothreitol, 0.1 mM EDTA) which was added slowly with 
stirring. After 3 0 min of additional stirring, the 
25 solution was centrif uged for 1 h, and the supernatant 
(112 0 ml) was treated with 252 g of solid ammonium sulfate 
which was added slowly with stirring. After 30 min. of 
additional stirring, the precipitate was collected by 
centrif ugation for 1 h, resuspended to a final volume of 

3 0 41 ml in 0.02 M potassium phosphate (pH 7.5) , 0 . 1 mM EDTA, 

10% (vol /vol) glycerol, 1 mM dithiothreitol, and dialyzed 
against two 2 1 portions of 0.02 M potassium phosphate (pH 
7.5), 0.1 M KC1, 0.1 mM EDTA, 1 mM dithiothreitol , 10% 
glycerol (2 h per change) . The dialyzed material was 
3 5 clarified by centrif ugation for 10 min to yield Fraction 
II (45 ml) . 
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Fraction II was diluted 10-fold into 0.02 M 
potassium phosphate (pH 7.5) , 0.1 M EDTA, 1 mM 
dithiothreitol , 10% glycerol so that the conductivity of 
the diluted solution was comparable to that of the 
5 dilution buffer containing * 0 . 1 M KCl. The solution was 
performed on small aliquots of Fraction II, and diluted 
samples were immediately loaded at 1 ml/min onto a 14.7 cm 
x 12 . 6 cm 2 phosphocellulose column equilibrated with 0.02 
M potassium phosphate (pH 7.5), 0.1 M KCl, 0.1 mM EDTA, 1 

10 mM dithiothreitol, 10% glycerol. The column was washed 
with 4 00 ml of equilibration buffer, and developed with a 
2 liter linear gradient of KCl (0.1 to 1.0 M) in 0.02 M 
potassium phosphate (pH 7.5) , 0.1 mM EDTA, 1 mM 
dithiothreitol, 10% glycerol. Fractions containing MutY 

15 activity, which eluted at about 0.4 M KCl, were pooled 
(Fraction III, 169 ml) . 

Fraction III was dialyzed against two 500 ml 
portions of 5 mM potassium phosphate (pH 7.5), 0 . 05 M KCl, 
0.1 mM EDTA, 1 mM dithiothreitol, 10% glycerol (2 h per 

20 change) until the conductivity was comparable to that of 
the dialysis buffer. After clarification by 

centrif ugation at for 10 min, the solution was loaded at 
0.5 ml/min onto a 21 cm x 2 . 84 cm 2 hydroxylapatite column 
equilibrated with 5 mM potassium phosphate, pH 7.5, 0.05 

25 M KCl, 0.1 mM EDTA, 1 mM dithiothreitol, 10% glycerol. 
After washing with 130 ml of equilibration buffer, the 
column was eluted with a 600 ml linear gradient of 
potassium phosphate (5 mM to 0.-4 M, pH 7.5) containing 
0.05 M KCl, 1 mM dithiothreitol, 10% glycerol. Fractions 

30 eluting from the column were supplemented with EDTA to 0 . 1 
mM. Peak fractions containing 60% of the total recovered 
activity, which eluted at about 0.1 M potassium phosphate, 
were pooled (Fraction IV, 24 ml) . The remaining side 
fractions contained impurities which could not be resolved 

3 5 from MutY by MonoS chromatography. 
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Fraction IV was diluted by addition of an equal 
volume of 0.1 mM EDTA, .1 dithiothreitol , 10% glycerol. 
After clarification by centrif ugation for 15 min, diluted 
Fraction IV was loaded at 0.75 ml/min onto a Pharmacia HR 
5 5/5 MonoS FPLC column that was equilibrated with 0.05 M 
sodium phosphate (pH 7.5) , 0.1 M NaCl, 0.1 niM EDTA, 0 . 5 mM 
dithiothreitol, 10% glycerol. The column was washed at 
0.5 ml/min with 17 ml of equilibration buffer and 
developed at 05 ml/min with a 
10 Ex 2 /Table 1 

Table 1. Purification of MutY protein from 290 g of E. 
coli RK1517 



Fraction 


Step 


Total 
Protein 


Specific 
Activitv 


Yield 






mg 


units /nig 


Percent 


I 


Extract 


10, 900 


40 


(100) 


II 


Ammonium sulfate 


1,350 


272 


84 


III 


Phosphocellulose 


66 


10,800 


160 


IV 


Hydroxylapatite 


1.4 


136, 000 


44 


V 


MonoS 


0.16 


480, 000 


18 



Specific A«G to C-G mismatch correction in cell- 
free extracts was determined as described previously (Au 
et al. 1988) , except that ATP and glutathione were omitted 
from the reaction and incubation was for 30 min instead of 
5 1 h. For complementation assays, each 0.01 ml reaction 
contained RK1517-Y33 extract (mutS mutY) at a 
concentration of 10 mg/ml protein. One unit of MutY 
activity is defined as the amount required to convert 1 
fmol of A*G mismatch to C-G base pair per h under 
10 complementation conditions. 

20 ml 1 inear gradient of NaCl (0.1 to 0.4 M) in 
0.05 M sodium phosphate (pH 7.5), 0.1 mM EDTA, 0.5 mM 
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dithiothreitol, 10% glycerol. Fractions with MutY 

activity, which eluted at approximately 0.2 M NaCl, were 
pooled (Fraction V, 2.6 ml) . Fraction V was divided into 
small aliguots and stored at -70° C. 
5 Assay for MutY- dependent , AoG-specific alvcosylase 

DNA restriction fragments were labeled at either 
the 3' or 5' ends with 32 P. Glycosylase activity was then 
determined in 0*01 ml reactions containing 10 ng end- 
labeled DNA fragments , 0.02 M Tris-HCl, pH 7.6, 1 mM EDTA, 

10 0.05 mg/ml bovine serum albumin, and 2.7 ng MutY. After 
incubation at 37° C for 30 min, the reaction mixture was 
treated with 2.5 x 10~ 3 units of HeLa AP endonuclease II in 
the presence of 11 mM MgCl 2 and 0.005% Triton X-100 for 10 
min at 37°C. Reactions were quenched by the addition of 

15 an equal volume of 80% formamide, 0.025% xylene cyanol, 
0.025% bromphenol blue, heated to 80°C for 2 min, and the 
products analyzed on an 8% sequencing gel. Control 
reactions contained either no MutY, no A°G mismatch or no 
AP endonuclease II . 

2 0 Strand cleavage at the AP site generated by MutY 

could also be accomplished by treatment with piperidine 
instead of treatment with AP endonuclease II. After 
incubation for 3 0 min. at 37° C with MutY as described 
above, the reaction mixture was precipitated with ethanol 

25 in the presence of carrier tRNA, then resuspended in 1 M 
piperidine and heated at 90°C for 30 min. After two 
additional ethanol precipitations, changing tubes each 
time, the pellet was resuspended in a minimum volume of 
water to which was added an equal volume of 80% formamide, 

30 0.025% xylene cyanol , 0.025% bromphenol blue. The 
products were then analyzed on an 8% sequencing gel. 
EXAMPLE 3 : Genetic Mapping Point Mutations in the Human 
Genome 

The full novelty and utility of the present 
35 invention may be further . appreciated by reference to the 
following brief description of selected specific 
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embodiments which advantageously employ various preferred 
forms of the invention as applied to a common problem in 
genetic mapping of point mutations in the human genome. 
In the course of constructing gene linkage maps, for 
5 example, it is frequently desirable to compare the 
sequence of a cloned DNA fragment with homologous 
sequences in DNA extracted from a human tissue sample. 
Substantially all base pairs in the entire homologous 
sequence of the cloned DNA fragment are compared to those 

10 of the human tissue DNA, most advantageously in a single 
test according to the present invention, merely by 
contacting both strands of the human tissue DNA molecule 
with both radiolabeled complementary strands of the second 
DNA molecule under conditions such that base pairing 

15 occurs, contacting the resulting DNA duplexes with the E. 
coli MutS protein that recognizes substantially all base 
pair mismatches under conditions such that the protein 
forms specific complexes with its cognate mispairs, and 
detecting the resulting DNA: protein complexes by 

20 contacting the complexes with a membranous nitrocellulose 
filter under conditions such that protein: DNA complexes 
are retained while DNA not complexed with protein is not 
retained, and measuring the amount of DNA in the retained 
complexes by a standard radiological methods or by 

25 utilizing any of the other methods of the invention; e.g. , 
altered electrophoretic mobility, or detection by use of 
antibodies . 

If the above detection test indicates the 
presence of sequence differences between the human tissue 

3 0 DNA and the cloned DNA and localization is required, or, 
in the alternative, if such differences are suspected and 
localization as well as detection of them is desired in a 
first analysis, the another method of this invention may 
be applied for these purposes. An embodiment of this 

3 5 aspect of the invention that may be most advantageously 
employed comprises the steps of contacting both strands of 
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the human tissue DNA molecule with both radiolabeled 
complementary strands of the second DNA molecule (usually 
without separation from the cloning vector DNA) under 
conditions such that base pairing occurs, contacting the 
5 resulting DNA duplexes with MutHLS to produce a GATC 
cleavage reaction or a modified form of MutS protein of E. 
coli to which is attached an hydroxyl radical cleaving 
function under conditions such that the radical cleaving 
function cleaves both strands of the DNA within about 20 

10 base pairs of substantially all DNA base mispairs. In the 
absence of any DNA base mispairs in the DNA duplexes 
comprising complementary strands of the human tissue and 
cloned DNAs, no DNA fragments smaller than the cloned DNA 
(plus vector DNA, if still attached) would be detected. 

15 Determination of the location of any double -stranded DNA 
cleavages by the modified MutS protein to within a few kbp 
or less of some restriction enzyme cleavage site within 
the cloned DNA is determined by standard restriction 
enzyme mapping approaches. If greater precision in 

2 0 localization and identification of a single base 
difference is desired, sequencing could be confined to 
those particular fragments of cloned DNA that span at 
least one base sequence difference localized by this 
method and are cleaved by a restriction enzyme at the most 

2 5 convenient distance of those sequence differences for 

direct sequencing. 

The examples herein can be changed to make use 
of other methods of separation to identify mismatches, 
such as a filter-binding assay, as well as the nicking 

3 0 reaction with MutS and MutL. While large (at least 20 

kbp) or small DNA molecules can be used in these methods 
those of between 1-10 kbp are preferred. 
EXAMPLE 4 : DNA Mismatch Detection Kit 

Kit contains MutS protein, dilution buffer, 
35 annealing buffer, reagents to generate complementary and 
mismatched control duplexes and filter binding protocol. 
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It can be used to detect single-base mismatches in 
oligonucleotides . 

MutS kit components : 

MutS protein in storage buffer: 50mM HEPES 
5 pH7.2, lOOmM KCl, lmM EDTA, ImM DTT; 

MutSl: 16mer oligonucleotide GATCCGTCGACCTGCA 
(all such oligonucleotides are written 5' to 3' herein) in 
water (2^M) ; 

MutS2: 16mer oligonucleotide TGCAGGTCGACGGATC 
10 1/iM in annealing buffer 1/zM: 20mM Tris/HCl pH 7.6, 5mM, 
MgCl 2 , O.lmM DTT, 0.0 ImM EDTA; 

MutS3 : 16mer oligonucleotide TGCAGGTTGACGGATC 
1 fM in annealing buffer; 

Assay buffer/ annealing buffer/wash buffer, 20mM 
15 Tris/HCl pH 7.6, 5mM MgCl 2/ O.lmM DTT, 0 . OlmM EDTA; 

Protein storage/dilution buffer: 50mM HEPES pH 
7.2, lOOmM KC1, ImM EDTA, ImM DTT. 

The DNA mismatch detection kit contains three 
16-mer oligonucleotides labeled MUTS1, MUTS2, and MUTS 3 

2 0 for testing the performance of MutS protein. When MUTS1 

and MUTS 2 are annealed, a perfectly matched duplex 
results. When MUTS1 and MUTS 3 are annealed, a duplex 
containing a single G-T mismatch results. These serve as 
contrtol substrates for MutS binding. 
25 Kinase Labeling of MUTS1 Oligonucleotide 

This protocol uses half the amount of 
oligonucleotide contained in the kit. To a 

microcentrifuge tube on ice add the following: 

MUTSl Oligonucleotide (2 fim) 15 jil (30 pmoles) 

3 0 10X T4 Polynucleotide Kinase Buffer 3 fil 

32 P-ATP (3000Ci/mmole) 1 jtl 

ATP (10 MM) 2.5 fil 

Sterile dH 2 0 7.5 /il 

T4 Polynucleotide Kinase (30 units//il) 1 pi {30 units) 
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Incubate the reaction mixture for 10 min at 37 °C. Then 
incubate 10 min at 70 °C. Spot two independent 1 fil 
aliquots of the mixture on a SureCheck TLC plate and also 
spot a dilution of 32 P-ATP (1:30 in water) in a separate 
5 lane and run with the elution mixture. Expose the 
developed plate to X-ray film for 5 min. Scrape all 
radioactive spots from both experimental lanes of the 
plate and count them in a liquid scintillation counter to 
determine the % incorporation of label* This value is 

10 typically 40-60%. If a significant labeled ATP spot is 
present in the kinase reaction lanes on the plate, the 
labeled oligonucleotide must be purified before use (TLC 
or gel) , since 32 P-ATP will contribute to background in the 
filter binding assay. In our experience, this is usually 

15 not necessary. 

Keep in mind that the MUTS1 oligo stock is 2 
pmol/jil and that the final concentration should be 1 
pmol//il. It is critical that this final concentration be 
as exact as possible, since the concentration determines 

2 0 the amount of MUTS1 in the next (annealing) step and 

hence, the amount of DNA available for binding by the 
protein. 

Annealing Reactions 

Two separate reactions are carried out : 
25 MUTS1/MUTS2 and MUTS1/MUTS3 . In both cases, the 32 P- 
labeled MUTS1 from Step 1 is used. 

Complementary Mismatched 

MUTS1 (kinased) 14 pi - 14 pmol MUTS1 (kinased) 14 fil = 14 pmols 

MUTS2 (1 /xM) 28 /il « 28 pmol MUTS3 (1 fiM) 28 ^1 = 28 pmols 

3 0 annealing buffer 2 8 ul annealing buffer 28 ul 

70 fil 70 fil 

1. Heat each mixture for 10 rain at 70°C. 

2. Incubate for 3 0 min at room temperature. 

3. Hold on ice until ready to use. 
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The molar ratio of MUTS2/MUTS1 and MUTS3/MUTS1 is 2:1 in 
the above reactions and this should be maintained for 
optimal results. Lowering the ratio of unlabeled to 
labeled strand may lead to very high background in the 
5 filter binding assay, presumably caused by sticking of 
labeled ssDNA to nitrocellulose. 
Assay of MutS Binding bv the Gel Shift Method 

The binding of MutS to mismatches can be 
assessed using the technique of Gel Shift Mobility Assay 

10 (GSMA) , a useful tool to identify protein-DNA interactions 
which may regulate gene expression. Below is a protocol 
for performing GSMA on the MUTS1/MUTS3 mismatched duplex 
contained in the the mismatch detection kit. Optimum 
conditions may vary depending on the particular mismatch 

15 being detected or the length of the oligonucleotide. 

All binding reactions should be carried out on 
ice. The total binding reaction volume is 10 /il. Add 4 
^1 of a MutS protein dilution (prepared using dilution 
buffer in the kit) containing 0.5-5 pmols (0.125-1.25 

2 0 units) of MutS protein (1 pmol = 97 ng) to 6 fil = 1.2 
pmols of 32 P- labeled MUTS 1 /MUTS 3 heteroduplex. Also add 
comparable amounts of MutS protein to labeled MUTS1/MUTS2 
matched duplex to serve as a control. A control 
incubation consisting only of mismatched heteroduplex (no 

2 5 MutS protein) should also be run. Incubate all reactions 

on on ice for 3 0 min. 

To 3 jil of the DNA/MutS mixture from each 
incubation add 1 /xl of a 50% w/v sucrose solution. 

Load 2 /xl of the mixture from Step 2 onto a 6% 

3 0 non- denaturing polyacrylmide gel prepared in Tris-acetate- 

EDTA (TAE) buffer (Sambrook et al . , "Molecular Cloning: 
A Laboratory Manual, Second Edition, cold Spring Harbor 
Laboratory, New York (1989)) to which MgCl 2 has been added 
to a final concentration of 1 mM and run the gel at 10 
3 5 V/cm and 4°C in TAE buffer containing 1 mM MgCl 2 until 
bromophenol blue dye (loaded into an adjacent well) has 
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migrated approximately half way down the gel. The 
presence of Mg++ in the gel and running buffer is critical 
for optimal results in the GSMA assay of MutS protein. 
Filter Binding Assay 
5 The total binding reaction volume is 10 jil. It 

consists of 6 /il, or 1.2 pmoles, of duplex DNA and 4 fil of 
a MutS protein dilution containing 0.5-5 pmoles (0.125- 
= 1.25 units) of MutS protein ( 1 pmol = 97 ng) . Each type 
of duplex, complementary and mismatched, should be assayed 
10 in duplicate or triplicate along with a no protein control 
for each annealing, which will serve as the background to 
subtract . 

In order to use the filter binding assay it will 
be necessary to make up additional annealing buffer for 
15 use in the washing step. Add 20 ml of 1M Tris-HDl, pH 
7.6, 5 ml of 1 M MgCl 2 , 0 . 1 ml of 1 M DTT, and 0.02 ml of 
0.5 M EDTA to distilled water and bring the volume to 1 
liter. 

For each binding assay, add the following to a 

2 0 0.5 ml microcentrifuge tube on ice: 

MUTS1/MUTS2 (Control) OR 
MUTS1/MUTS3 (Mismatched) 
Annealing Mixture 6 fil 

Set up the filtration apparatus and presoak the 
25 nitrocellulose filters in annealing buffer. 

Add 4 pi of MutS protein dilution to the 
annealing mixtures on ice. Also include no protein 
controls for each annealing. 

After 3 0 minutes, begin filtration of samples. 

3 0 Caution, use a slow rate of filtration. It should take at 

least a second or two for the 10 /xl sample to filter. 

Immediately wash the filters with 5 ml each of 
cold annealing buffer. This should take 20-30 seconds. 

Place the filters in liquid scintillation vials, 
35 add fluid and count for 2 minutes each. 
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Determine the input cpm for each annealing as 
follows: To 6 /il of annealing mixture, add 54 pi of water 
and count 2-3 aliquots of 6 /zl each in scintillation 
fluid. The input cpm is then 10X the average of the cpm 
5 of the dilution. 

Determine the cpm/pmole of DNA as follows: 

cpm of 6 fil aliquot X dilution X fraction of label incorporate 
pmol of DNA in annealing reaction 

.0 A 6 pLl annealing contains 1.2 pmoles of DNA 

A typical kinase reaction may give 42% incorporation 
(determined previously) 

A 6 fil aliquot of 10X dilution may be 10,600 cpm 

10,600 x 10 x 0.42 = 37,100 cpm/pmole DNA 
5 1.2 

Determine the pmoles of DNA bound by various pmoles of 
MutS . First , determine the pmoles of MutS protein in a 
binding reaction. : 

concentration of MutS x volume of protein added 
0 molecular weight of MutS x culution factor 

Example: " If 4 jil of a 6X dilution of MutS at 250 tig/ml is 
used, then: 

250 na/ul x 4 ul = 1.72 pmoles of MutS in reaction 
97 ng/pmole x 6 

5 Then, determine the pmoles of DNA bound: 

cpm retained on filter with MutS protein- cpm on no protein filter 

cpm/pmole of DNA 

Example: One gets 15,470 cpm on the filter with MutS and 
0 340 cpm with no protein 

15,470 cpm - 340^cr>m = 0.408 pmoles of DNA bound 
37,100 cpm/pmole 
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Determine the number of pmoles of MutS required to bind 1 
pmole of DNA (i.e. 7 a unit of MutS) . 

In the above example, 1.72 pmoles of MutS bound 0.4 09 
pmoles of DNA, such that one unit = 1.72/0.408 = 4.2 
5 pmoles MutS per mole DNA. 

EXAMPLE 5: Effects of MutS and MutL on RecA- catalyzed 

Strand Transfer 
A model system used to evaluate MutS and MutL 
effects on RecA catalyzed strand transfer is depicted in 

10 Figure 6. The assay for RecA- catalyzed strand transfer 
between homologous and quasi -homologous DNA sequences 
employed the three strand reaction in which one strand 
from a linear duplex DNA is transferred to an homologous, 
single-stranded DNA circle (Cox, 78 Proc. Natl. Acad. 

15 Sci . USA 3433, 1981. These experiments exploited the 
previous observation that RecA is able to support strand 
transfer between related fd and M13 DNAs (Bianchi et al . , 
35 Cell 511, 1983; DasGupta et al . , 79 Proc. Natl. Acad. 
Sci USA 762, 1982, which are approximately 97% homologous 

20 at the nucleotide level. The vast majority of this 
variation is due to single base pair changes. 

Results of experiments on the effects of MutS 
and MutL on RecA- catalyzed strand transfer between 
homologous and quasi -homologous DNA sequences are shown in 

25 Figure 7. Reactions (50 /il) contained 50 MM HEPES (Ph 
7.5), 12 Mm MgCl 2 , 2 raM ATP, 0.4 mM dithiothreitol , 6 mM 
phosphocreatine, 10 U/ml phosphocreatine kinase, 0.6 nM 
single -stranded circular DNA (molecules), 7.6 pig RecA 
protein, 0.54, fig SSB, and MutS or MutL as indicated. 

3 0 Reactions were allowed to preincubate at 37°C for 10 
minutes, strand exchange was initiated by addition of 
linear dupLex fd DNA (Rf DNA linearized by cleavage with 
Hpal, 0.6 nM final concentration as molecules), and 
incubation continued for 70 minutes. MutS or MutL was 

3 5 added 1 minute prior to addition of duplex DNA. Sample 
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(50 jil) were quenched by addition of EDTA (25 mM) , sodium 
dodecyl sulphate (0..1%), and proteinase K (150 jig/ml) , 
followed by incubations at 42°C for 30 minutes. 

The presence of MutS or MutL was without 
5 significant effect on strand transfer between linear 
duplex fd DNA and circular fd single -strands, MutS did 
inhibit strand transfer between quasi -homologous linear 
duplex fd DNA and M13 single-strands. Similar results 
were obtained for strand transfer between duplex Ml 3 DNA 
10 and single- stranded fd (data not shown) . In contrast, 
MutL alone did not significantly alter the yield of 
circular duplex product formed by RecA catalyzed strand 
transfer between these different DNAs. 

EXAMPLE 6: MutL Potentiation of MutS Block to Strand 
Transfer 

Results of experiments on the MutL potentiation 
of the MutS block to strand transfer in response to 
mismatched base pairs are shown in Figure 8. Reaction 
mixtures (210 /xl) contained 50 mM HEPES (pH 7.5), 12 mM 
MgCl 2 , 2 mM ATP, 0.4 mM dithiothreitol , 6 mM phosphocre- 
atine, 10 U/mL phosphocreatine kinase, 0.6 nM (molecules) 
single-stranded circular DNA, 32 fig recA protein, and 2.3 
fig SSB. Reactions were preincubated for 10 minutes at 
37°C and strand exchange initiated by addition of duplex 
fd DNA (Rf DNA linearized by cleavage with Hpal, 0.6 nM 
final concentration as molecules) . When present, MutS 
(2.9 fig) and/or MutL (1.3 fig) were added 1 minute prior to 
addition of duplex DNA. Samples were removed as indicated 
times and quenched as described in Example 5. 

MutL potentiates the inhibition of heteroduplex 
formation that is observed with MutS. Formation of full 
length, circular heteroduplex product is virtually abol- 
ished in the presence of MutS and MutL. Heteroduplex 
formation between perfectly homologous strands occurred 
readily in the presence of either or both proteins . 
EXAMPLE 7: MutS and MutL Block of Branch Migration 
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While MutS and MutS along with MutL blocked 
formation of fully duplex, circular fd-M13 product, some 
strand transfer did occur in these reactions as 
demonstrated by the occurrence of strand transfer "inter- 
5 mediates" that migrated more slowly in agarose gels than 
fully duplex, nicked circular product (data not shown) . 
The nature of these structures was examined using the SI 
nuclease procedure of Cox and Lehman to evaluate mean 
length of stable heteroduplex formation. This analysis is 

10 shown in Figure 9. 

Reaction mixtures (510 jil) contained 50 mM HEPES 
(pH 7.5), 12 mM MgCl 2/ 2 mM ATP, 0.4 mM dithiothreitol , 6 
mM phosphocreatine, 10 D/mL phosphocreatine kinase, 0.6 nM 
single -stranded circular DNA (molecules) , 77 fig RecA 

15 protein, 5.5 fig SSB, and when indicated 6.9 fig MutS and 
3.2 fig MutL. Reactions were allowed to preincubate at 
37°C for 10 minutes, strand exchange was initiated by 
addition of linear duplex [ 3 H]M13 DNA (Rf DNA linearized by 
cleavage with Hpal, 0.6 nM final concentration as mole- 

20 cules) . MutS or MutL was added 1 minute prior to addition 
of M13 duplex DNA. Samples (100 fil) were taken as indi- 
cated, quenched with sodium dodecyl sulphate (0.8%), and 
extracted with phenol : chloroform: isoamyl alcohol (24 :24 :1) 
equilibrated with 10 mM Tris-HCl, pH 8.0, 0.1 mM EDTA. 

25 The organic phase was back- extracted with 0.5 volume of 50 
mM HEPES , pH 5.5. Aqueous layers were combined washed 
with H 2 0- saturated ether, and relieved of residual ether by 
30 minutes incubation at 37°C. The mean length of stable 
heteroduplex was then determined using SI nuclease (10 

3 0 U/ml) according to Cox and Lehman (Cox, 1981 supra ) . 

Although some strand transfer occurs between fd 
and M13 DNAs in the presence of MutS and MutL, 
heteroduplex formation is restricted to about one kilobase 
of the 6.4 kilobase possible. The MutS and MutL effects 

3 5 on recombination are due , at least in part, to their 
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ability to control branch migration reaction in response 
to occurrence of mismatched base pairs. 

Other embodiments are within the following 

claims . 
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WHAT IS CLAIMED IS: 

1. A diagnostic method for detecting a base 
pair mismatch in a DNA duplex, comprising the steps of : 

contacting at least one strand of a first DNA 
5 molecule with the complementary strand of a second DNA 
molecule under conditions such that base pairing occurs; 

contacting a DNA duplex potentially containing 
a base pair mismatch with a mispair recognition protein 
under conditions suitable for said protein to form a 
10 specific complex only with said DNA duplex having a base 
pair mismatch, and not with a DNA duplex lacking a base 
pair mismatch, and 

detecting any said complex as a measure of the 
presence of a base pair mismatch in said DNA duplex- 

15 2. The method of claim 1 wherein said mispair 

recognition protein is the product of the mutS gene of E. 
coli or species variation thereof, or portion thereof 
encoding the recognition domain. 

2 0 3. The method of claim 2 wherein said protein 

recognizes all eight possible base pair mismatches. 

4. The method of claim 1 wherein said step for 
detecting the DNA: protein complex further includes the 
steps of : 

25 contacting said DNA: protein complexes with a 

selectively adsorbent agent under conditions such that 
said protein: DNA complexes are retained on said agent 
while DNA not complexed with protein is not retained, and 
measuring the amount of DNA in said retained 

30 complexes. 



5 . The method of claim 4 wherein said adsorbent 
agent is a membranous nitrocellulose filter. 
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6. The method of claim 1, wherein said step 
for detecting said DNArprotein complex further comprises 
a step wherein the electrophoretic mobility of said 
DNA: protein complex is compared to that of uncomplexed 

5 DNA. 

7. The method of claim 2, wherein said mispair 
recognition protein is the product of the mutS gene of S. 
t vphimurium . the hexA gene of S. pneumoniae or the MSH1 ; 
MSH2, genes of yeast. 

10 8. The method of claim 1 wherein said step for 

detecting the DNA: protein complex further includes a step 
wherein an antibody specific for said DNA base mispair 
recognition protein is employed. 

9. A method for detecting and localizing a 
15 base pair mismatch in a DNA duplex, comprising the steps 

of: 

contacting at least one strand of the first DNA 
molecule with the complementary strand of the second DNA 
molecule under conditions such that base pairing occurs; 

2 0 contacting the resulting double -stranded DNA 

duplexes with a mispair recognition protein under 
conditions such that the protein forms specific complexes 
with mispairs, 

subjecting said duplex molecules to hydrolysis 
25 with an exonuclease under conditions such that said 
complex blocks hydrolysis, and 

determining the location of said block to 
hydrolysis by a suitable analytic method. 

10. A method for detecting and localizing a 

3 0 base pair mismatch in a DNA duplex, comprising the steps 

of 
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contacting at least one strand of a first DNA 
molecule with the complementary strand of a second DNA 
molecule under conditions such that base pairing occurs; 

contacting the resulting double -stranded DNA 
5 duplexes with a mispair recognition protein under 
conditions such that the protein forms specific complexes 
with mispairs and thereby directs modification of at least 
one strand of the DNA in the resulting DNA:protein 
complexes in the vicinity of the DNA: protein complex, and 

determining the location of the resulting DNA 
modification by a suitable analytical method. 

.11. The method of claim 10, wherein said 
mispair recognition protein is the product of the mutS 
gene of E. coli or a functionally homologous protein. 

12. The method of claim 10, wherein the step 
for modifying the DNA duplex in the vicinity of the 
complexed protein comprises contacting said complexes with 
a defined mismatch correction system or subset comprising 
the following purified components: E. coli MutH, and MutL, 
proteins, or species variations thereof, DNA helicase II, 
single-stranded DNA binding protein, DNA polymerase III 
holoenzyme, exonuclease I, exonuclease VII or RecJ 
exonuclease, ATP and one or more dideoxynucleoside 
triphosphates, under conditions that produce a 
discontinuity in one or both strands of the DNA duplex in 
the vicinity of the mismatch, 

13. The method of claim 12 wherein said step 
for determining the location of said single-stranded gaps 
within said DNA duplex further includes the steps of 

3 0 cleaving said DNA with a single -stranded specific 
endonuclease, and comparing the electrophoretic mobilities 
of said cleaved fragments with unmodified DNA fragments. 



15 
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25 
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14. The method of claim 10, wherein the step 
for modifying the DNA duplex in the vicinity of the 
complexed protein comprises : 

contacting said complexes with proteins of a 
5 mismatch repair system, ATP and a divalent cation under 
conditions such that an endonucleolytic incision is 
introduced in one strand of the duplex molecule. 

15 . The method of claim 14 , wherein the 
suitable analytic method comprises size fractionation by 

10 electrophoretic mobility under denaturing condition 
relative to unmodified DNA fragments . 

16. The method of claim 14, wherein said 
proteins of a mismatch repair system are the MutH, MutL, 
and MutS proteins of the E. coli methyl -directed mismatch 

15 repair system. 

17 . A mispair recognition protein which has 
been altered to provide means for modifying at least one 
strand of the DNA duplex in the vicinity of the bound 
mispair recognition protein. 

20 18. The altered mispair recognition protein of 

claim 17 wherein said protein is the product of the mutS 
gene of E. coli or of another functionally homologous 
protein to which is attached an hydroxyl radical cleaving 
function. 

25 19. The altered mispair recognition protein of 

claim 17, wherein the protein comprises a segment of the 
native molecule that contains the mispair recognition 
domain. 

20. The altered mispair recognition protein of 
3 0 claim 18, wherein the hydroxyl radical cleaving function 
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is selected from the group consisting of the 1,10- 
phenanthroline- copper complex, the EDTA iron complex, and 
the copper binding domain of serum albumin, 

21. The altered mispair recognition protein of 
5 claim 17, wherein said protein is the product of the mutS 

gene of E . coli or of another functionally homologous 
protein to which is attached attachment a DNA endonuclease 
activity capable of cleaving double -stranded DNA. 

22 . The altered mispair repair protein of claim 
10 21, wherein said endonuclease activity is provided by the 

DNA cleavage domain of Fokl endonuclease. 

23. A method for detecting and localizing a 
base pair mismatch within a DNA duplex, comprising the 

15 steps of: 

contacting at least one strand of the first DNA 
molecule with the complementary strand of the second DNA 
molecule under conditions such that base pairing occurs; 

contacting resultant duplex DNA molecules with 

2 0 an altered mispair recognition protein, under conditions 

such that the protein forms specific complexes with a 
mispair and thereby directs modification of at least one 
strand of the DNA in the resulting DNA protein complexes 
in the vicinity of the DNA protein complex, and 
25 determining the location of the modification of 

the DNA by a suitable analytic method. 

24. A method for detecting and localizing A-G 
mispairs in a DNA duplex, comprising the steps of: 

contacting at least one strand of the first DNA 

3 0 molecule with the complementary strand of the second DNA 

molecule under conditions such that base pairing occurs; 

contacting resultant duplex DNA molecules with 
a mispair recognition protein that recognizes A-G mispairs 
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and an apurinic endonuclease or lyase under conditions 
such that in the presence of a mismatch an endonucleolytic 
incision is introduced in the duplex molecule, and 

determining the location of the incision by a 
5 suitable analytic method. 

25. The method of claim 24, wherein said A-G 
mispair recognition protein is the product of the mutY 
gene of E. coli. 

26. The method of claim 24, wherein the 
10 analytical method comprises gel electrophoresis. 

27. The method of claims 1, 9, 10, 23, and 24, 
wherein the DNA molecules are obtained from the following 
sources: different individuals of the same species, 
individuals of different species, individuals of different 

15 kingdoms, different tissue types, the same tissue type in 
different states of growth, different ceil types, cells of 
the same type in different states of growth, cells of the 
same origin in different stages of development and cells 
of the same type that may have undergone differential 

2 0 somatic mutagenesis. 

28. The method of claims 27, wherein one of the 
DNA molecules comprises a probe sequence that has been at 
least partially characterized. 

29. Assay kit for detecting a base pair 
2 5 mismatch in a DNA duplex comprising one or more of the 

following components ; 

an aliquot of a mispair recognition protein, 
an aliquot of control oligonucleotides, and 
an exonuclease. 
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30. The kit of claim 2 9 or 34, wherein the 
mispair recognition protein is the product of the mutS 
gene of E. coli . 

31. Assay kit for detecting and localizing a 
5 base pair mismatch in a DNA duplex comprising one or more 

of the following components; 

an aliquot of all or part of a mismatch repair 

system; 

an aliquot of dideoxynucleoside triphosphates; 

10 and 

a single-strand specific endonuclease .. 

32. The kit of claim 31, wherein the mismatch 
repair system includes: the products of the E. coli mutH, 
mutL, and mutS genes, or species variations thereof, DNA 

15 helicase II, single-strand DNA binding protein, DNA 
polymerase III holoenzyme, exonuclease I, Exo VII 
exonuclease or RecJ exonuclease, DNA ligase, and ATP. 

33. The kit of claim 31, wherein the mismatch 
repair system includes the products of the E. coli mutH, 

2 0 mutL, and mutS genes, or species variations thereof, and 
ATP. 

34 . Assay kit for detecting and localizing 
individual base sequence differences within homologous 
regions of two DNA molecules comprising: 

25 an aliquot of an altered mispair recognition 

protein. 

35. Assay kit for detecting and localizing an 
A-G mispair within a DNA duplex comprising one or more of 
the following components; 

30 an aliquot of an A-G mispair recognition 

protein; and 

BNSDOCID: <WO 05126B8A1J_> 



WO 95/12688 



FCT/US94/12501 



87 

an aliquot of an apurinic endonuclease or lyase . 

36 . The kit of claim 35, wherein the A-G 
mispair recognition protein is the product of the MutY 
gene of E. soli. 

5 37. A method for eliminating DNA molecules 

containing one or more base pairing mismatches from a 
population of heterohybrid duplex DNA molecules formed by 
base pairing of single-stranded DNA molecules obtained 
from a first source and a second source, comprising the 
10 steps of: 

digesting genomic DNA from said first and said 
second source with a restriction endonuclease, 
methylating the DNA from one of said 

sources , 

15 denaturing said DNA from said first and said 

second source, 

mixing DNA from said first and said second 
source in the presence of a recombinase protein, proteins 
of a mismatch repair system that modulate said recombinase 

20 protein, single-strand binding protein, and ATP, under 
conditions such that DNA duplexes form in homologous 
regions of DNA molecules from said first and said second 
source and the presence of a base pair mismatch results in 
regions that remain single -stranded, and 

25 removing molecules that contain a said single- 

stranded region from said population. 

38. The method of claim 37, wherein said 
recombinase protein is the E. coli RecA protein. 

39. The method of claim 37, wherein said 
3 0 mismatch repair system comprises the E. coli methyl - 

directed mismatch repair system and consists of the MutS 
and MutL proteins . 
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40. A method for eliminating DNA molecules 
containing one or more mismatches from a population of 
heterohybrid duplex DNA molecules formed by a strand 
transfer reaction between duplex DNA molecules obtained 

5 from a first source and denatured DNA molecules obtained 
from a second source, comprising the steps of: 

restriction digesting genomic DNA from said 
first and said second sources, 

methylating the DNA of said first or said second 

10 source, 

denaturing DNA from said second source, 
mixing DNA molecules from said first and said 
second source in the presence of a protein which catalyzes 
a strand transfer reaction, proteins of a mismatch repair 
15 system that modulate said protein which catalyzes a strand 
transfer reaction, single-strand binding protein, and ATP, 
under conditions such that DNA heteroduplexes form in 
homologous regions of DNA molecules from the first and the 
second source and the presence of a base pair mismatch 
20 results in regions that remain single-stranded, and 

removing molecules that contain a said single - 
stranded region from said population. 

41. The method of claim 37 or 40, wherein the 
removal of said molecules containing a single -stranded 

25 region is by treatment with a single-strand specific 
nuclease . 

42. The method of claim 37 or 40, wherein the 
removal of said molecules containing a single -stranded 
region is by chromatography on benzoylated naphthoylated 

3 0 DEAE. 

43. The method of claim 40, wherein said 
strand transferase protein is the E. coli RecA protein. 
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44. The method of claim 40, wherein mismatch 
repair system comprises the E. coli methyl -directed 
mismatch repair system and consists of the MutS and MutL 
proteins . 

5 45. The method of claim 37 or 40, wherein said 

sources of DNA are selected from the group consisting of: 
individuals of the same species, individuals of different 
species, individuals of different kingdoms, different 
tissue types, the same tissue type in different states of 
10 growth, different cell types, cells of the same type in 
different states of growth, cells of the same origin in 
different stages of development and cells of the same 
origin have undergone differential somatic mutagenesis. 

46. A method for genomic mismatch scanning, 
15 wherein the hybridization step comprises the method of 

claim 37 or 40. 

47. A method of genomic mismatch scanning, 
wherein heterohybrids DNA molecules containing base pair 

20 mismatches are removed, without the use of exonuclease 

III, comprising the steps of: 

contacting a population of heterohybrid DNA 

molecules potentially containing a base pair mismatch with 

a DNA mismatch repair system in the presence of one or 
25 more dideoxynucleoside triphosphates under conditions such 

that a single -stranded region is generated a DNA molecule 

that contained a base pair mismatch and, 

removing said molecule containing a s ingle - 

stranded region from the population. 

30 48. The method of claim 47, wherein said DNA 

mismatch repair system is the E. coli methyl-directed 
mismatch repair system. 
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49. The method of claim 47 , wherein the removal 
of said molecule containing a single -stranded region is by 
chromatography on benzoylated naphthoylated DEAE. 

50. The method of claim 47, wherein the removal 
5 of said molecule containing a single-stranded region is by 

treatment with a single-strand specific nuclease, 

51. A method for fractionating a population of 
DNA molecules based upon a mismatch in a subset of said 
molecules, wherein a heterohybrid DNA molecule containing 

10 a base pair mismatch is separated from non-mismatch- 
containing molecules, without the use of exonuclease III, 
comprising the steps of : 

contacting a population of heterohybrid DNA 
molecules potentially containing a base pair mismatch with 

15 a DNA mismatch repair system and biotinylated nucleotide 
triphosphates under conditions such that biotinylated 
nucleotides are incorporated into DNA molecules that 
contained a base pair mismatch and, 

separating said molecule containing biotinylated 

20 nucleotides from those not containing said nucleotides by 
binding to avidin. 

52. The method of claim 51, wherein said DNA 
mismatch repair system is the E. coli methyl-directed 
mismatch repair system. 

.25 53 . A method of genomic mismatch scanning, 

wherein a heterohybrid DNA molecule containing a base pair 
mismatch is removed, comprising the steps of: 

ligating all DNA duplex molecules under such 
condition that monomer circular DNA molecules are formed 
30 prior to contacting with Exo III. 
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54 . A method for correcting base pair 
mismatches in a population of DNA duplexes that have been 
produced by enzymatic amplification, potentially 
containing one or more base pair mismatches, comprising 
the step of : 

contacting said population of DNA duplexes with 
a DNA methylase and a mismatch repair system under condi- 
tions such that base pair mismatches are corrected. 



55. The method of claim 54, wherein said mis- 
10 match repair system is the methyl -directed mismatch repair 

system of E. coli . 

56. A method for removing DNA molecules 
containing one or more base pair mismatches in a 
population of DNA duplexes that have been produced by 

15 enzymatic amplification, potentially containing one or 
more base pair mismatches, comprising the steps of: 

contacting said population of DNA duplexes with 
a mismatch repair system under conditions such that one or 
more components of said mismatch repair system form a 

2 0 specific complex with a base pair mismatch contained in a 

DNA duplex having a base pair mismatch, and 

removing said DNA duplex containing said complex 
from the population of duplex molecules. 

57. The method of claim 56, wherein removal of 
25 said duplex containing said complex is by chromatography 

using the MutS protein affixed to a solid support. 

58. A method for removing DNA molecules 
containing one or more base pair mismatches in a 
population of DNA duplexes that have been produced by 

3 0 enzymatic amplification, potentially containing one or 

more base pair mismatches, comprising the steps of: 
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contacting said population of DNA duplexes with 
a mismatch repair system under conditions such that a DNA 
duplex molecule containing a base pair mismatch cannot 
produce a full -sized product in a subsequent round of 
5 enzymatic amplification. 

59. The method of claim 58, wherein said mis- 
match repair system comprises the methyl -directed mismatch 
repair system of E. coli consists of the Muts, MutL, and 
MutH proteins. 

60 . A method for removing DNA duplex molecules 
containing base pair mismatches in a population of 
heteroduplex DNA molecules produced from different sourc- 
es, comprising the steps of: 

contacting said population of DNA duplex mole- 
cules potentially containing base pair mismatches with 
some or all components of a mismatch repair system under 
conditions such that said component or components form a 
complex with DNA duplex molecules containing a base pair 
mismatch, and not with a DNA duplex molecules lacking a 
base pair mismatch, and 

removing DNA molecules containing said complex 
or the product of said complex. 

61. The method of claim 60, wherein some or all 
proteins of the mismatch repair system have been affixed 

25 to a solid support and removal of said complex is by 
adsorption. 

62. The method of claim 60, wherein said 
complex interacts with other cellular proteins and removal 
of said complex occurs through said interaction. 

30 63. The method of claim 60, wherein said condi- 

tions include the use of biotinylated nucleotides such 
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that said nucleotides are incorporated into duplex mole- 
cules that contain a base pair mismatch and such duplexes 
are removed by binding to avidin. 

64. The method of claim 45, wherein one of said 
5 sources consists of a probe sequence that has been at 

least partially characterized. 

65. Kit for correcting base pair matches in 
duplex DNA molecules comprising one or more of the 
following purified components: 

10 an aliquot of E. coli MutH, MutL, and MutS 

proteins or species variations thereof, 

an aliquot of DNA helicase II, an aliquot of 
single-strand DNA binding protein, 

an aliquot of DNA polymerase III holoenzyme, 
15 an aliquot of exonuclease I, 

an aliquot of Exo VII or RecJ, 
an aliquot of DNA ligase, 
an aliquot of ATP, and 

an aliquot of four deoxynucleoside 
2 0 triphosphates . 

66. Assay kit for eliminating DNA molecules 
containing one or more base pairing mismatches from a 
population of heterohybrid duplex molecules formed by base 
pairing of single- stranded DNA molecules obtained from a 

25 first and a second source comprising one or more of the 
following components : 

an aliquot of proteins of a mismatch repair 
system, and 

an aliquot recombinase protein. 

30 67. The kit of claim 66, wherein the proteins 

of the mismatch correction system are the MutS and MutL 
proteins of E. coli. 
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68. Assay kit for removing DNA molecules 
containing one or more base pair mismatches comprising: 

an aliquot of one or more proteins of a mismatch 
repair system have been affixed to a column support. 

5 69. The assay kit of claim 68, wherein said 

protein of the mismatch repair system is the MutS protein 
of E. coli . 

70 . Kit for fractionating a heteroduplex DNA 
population into two parts, one consisting of mismatch 

10 duplexes and the other consisting of duplexes that contain 
a mismatch at the beginning of the procedure, comprising 
one or more of the following components: 

an aliquot of all components of a complete 
mismatch repair system; 
15 an aliquot of biotinylated nucleotides; and 

an aliquot of avidin or avidin-based support. 

71. The kit of claim 70, wherein the mismatch 
repair system is from E. coli and consists of products of 
the mutH, mutL, and mutS genes, DNA helicase II, single- 

2 0 strand DNA binding protein, DNA polymerase III holoenzyme, 
exonuclease I, Exo VII exonuclease or RecJ exonuclease, 
DNA ligase, and ATP. 

72. The kit of claim 34, wherein the mispair 
recognition protein is the product of the mutS gene of E . 

25 coli . 

73. The method of claims 12 or 47, wherein said 
exogenous dideoxynucleoside triphosphates are omitted. 
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